Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions

Alimi, Oyeniyi Akeem

doi:10.3390/technologies13050176

Open AccessReview

Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions

by

Oyeniyi Akeem Alimi

Department of Information Systems, Durban University of Technology, Durban 4001, South Africa

Technologies 2025, 13(5), 176; https://doi.org/10.3390/technologies13050176

Submission received: 12 March 2025 / Revised: 19 April 2025 / Accepted: 25 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue IoT-Enabling Technologies and Applications)

Download

Browse Figures

Versions Notes

Abstract

The prospect of integrating every object under a unified infrastructure, which provides humans with the possibility to monitor, access, and control objects and systems, has played a significant role in the geometric growth of the Internet of Things (IoT) paradigm, across various applications. However, despite the numerous possibilities that the IoT paradigm offers, security and privacy within and between the different interconnected devices and systems are integral to the long-term growth of IoT networks. Various sophisticated intrusions and attack variants have continued to plague the sustainability of IoT technologies and networks. Thus, effective methodologies for the prompt identification, detection, and mitigation of these menaces are priorities for stakeholders. Recently, data-driven artificial intelligence (AI) models have been considered effective in numerous applications. Hence, in recent literature studies, various single and ensemble AI subset models, such as deep learning and reinforcement learning models, have been proposed, resulting in effective decision-making for the secured operation of IoT networks. Considering the growth trends, this study presents a critical review of recently published articles whereby learning models were proposed for IoT security analysis. The aim is to highlight emerging IoT security issues, current conventional strategies, methodology procedures, achievements, and also, importantly, the limitations and research gaps identified in those specific IoT security analysis studies. By doing so, this study provides a research-based resource for scholars researching IoT and general industrial control systems security. Finally, some research gaps, as well as directions for future studies, are discussed.

Keywords:

machine learning; deep learning; reinforcement learning; denial of service; IoT intrusion analysis dataset; cyberattacks; intruder detection system; Internet of Things; IoT architecture

1. Introduction

In recent times, the Internet of Things (IoT) paradigm has continued to be an important topic in every space and field. Its applications broadly cover various domains ranging from home automation to industries, health, education, etc. [1,2,3]. Due to technological advancements, the global adoption rate of IoT devices and tools is increasing geometrically [4]. Figure 1 presents the estimated projection of the number of IoT-connected devices globally over the next decade [5]. As shown in Figure 1, the global adoption of IoT-connected devices in the year 2033 is expected to be double the current amount. However, while the adoption of IoT tools and associated advanced technologies continues to grow exponentially, the adoption also brings along a series of challenges in terms of cyber-intrusions and attacks [6]. Intruders can exploit various vulnerabilities to access sensitive data, launch a variety of sophisticated attacks into networks, and/or assume control of critical infrastructure [7,8]. The secure operation of various IoT networks is a top priority for stakeholders, including governments and standard manufacturers, as most social, political, and economic activities are potentially at risk [9,10]. As the number of connected devices soars, so does the potential for security breaches and intrusions. Adversaries can physically attack or tamper with network end nodes, software, or communication standards and protocols, thereby inflicting unimaginable damage to users [10,11,12,13].

In recent years, various documented incidents such as the 2021 Verkada hack [14], the 2016 Lappeenranta cyberattack [15], the 2019–2021 Mozi botnet attack [3], the 2016 Mirai botnet attack, and so on, have shown the devastating consequences of data breaches, intrusions, and attacks. In 2018, the United States Food and Drug Administration issued a statement that focused on vulnerabilities in two models of Medtronic programmers that are deployed with cardiac implantable electrophysiology devices, including pacemakers, implantable defibrillators, and implantable cardiac monitors. It was reported that attackers could potentially hack and manipulate their various features [16]. Various literature studies have discussed similar episodes [17,18,19,20,21]. The World Economic Forum acknowledged cyber threats as a key threat to the global economy in its 2019 Global Risk Report [22]. The global consulting firm; McKinsey and Company estimated that cyberattacks’ damages will cost an estimated USD 10.5 trillion annually by 2025—a 300% increase compared to the figure a decade ago [23]. As expected, expenditures on cyber-security have soared in response to the menace. The IoT security market size is expected to expand from USD 35.6 billion in 2024 to USD 383.11 billion by 2034, expanding at a compound annual growth rate of 26.82% over the estimated period [24]. Figure 2 presents the estimated projection of IoT security market size over the next 5 years [24]. As shown in Figure 2, the market size is projected to grow at a geometric rate. The IoT security market size by the year 2030 is projected to be triple the amount it is currently in the year 2025.

Various studies have identified important security requirements, including adequate authorization and authentication, access control, monitoring and tracking systems, data and information integrity, mutual trust, privacy, etc., as being essential for mitigating and reducing cyber-attacks and associated risks [2,7,13,15,25]. Over the years, various traditional models and variations of popular techniques, including tamper proof, secure element, end-to-end encryption, etc., have been proposed to achieve these goals. Table 1 presents a comparison of some popular traditional security methods that are deployed across different layers/domains of typical IoT networks.

Despite demonstrating satisfactory performances in some cases, the methods and models have shown symptoms of being computationally expensive and inefficient in adapting to the heterogeneous yet diverse nature of IoT networks. International Data Corporation estimated that the sum total of global data will reach 175 ZB by 2025, whereby 90 ZB of the data will be generated from IoT end nodes [26]. The traditional methods are overburdened and unreliable with the massive data generated from the complex IoT network topologies. Additionally, considering that emerging cyber-attacks, intrusions, and data breaches are conducted in a coordinated, furtive, and sophisticated manner, conventional methods are systemically inept at effectively tackling the menace. In recent times, data-driven automation models, particularly variants of machine learning, including deep learning techniques and reinforcement learning models, have been extensively used to model and monitor complex applications. Unlike conventional schemes, data-driven models can effectively learn and understand the dynamic characteristics of IoT networks and applications. With generated datasets containing millions of instances with numerous features, various researchers in recent times have proposed solutions in the form of automation models. The models identify and respond to intrusions without the need for constant human intervention.

In line with the increasing IoT security and vulnerability concerns, several literature studies have reviewed and analyzed the problems from various points of view. Alaba et al. [2] reviewed varieties of IoT threats and vulnerabilities in the context of application, architecture, and communication. In a related study, Pajouh et al. [27] reviewed IoT security requirements, challenges, and solutions based on the three-layer architectural structure. In a similar study, Kouicem et al. [28] provided a survey of some proposed security and privacy solutions for IoT networks. Yang et al. [29] also provided an analysis of biometrics research in IoT security, with a focus on authentication and encryption. Sadhu et al. [30] reviewed varieties of IoT security and privacy menaces, including some emerging countermeasures. The authors of [31] presented a systematic review of Industrial IoT (IIoT) security, with a special focus on how fog computing can be considered in the context of IIoT security requirements. With regard to data-driven approaches and solutions to complex network security challenges, the authors in [10] presented an analysis of recent literature studies whereby supervised learning techniques were modeled for SCADA intrusion solutions. In a similar previous study, the authors in [6] reviewed studies involving machine learning techniques that were proposed for smart power system menaces.

To bridge the research gaps, this paper analyzes emerging security challenges and solutions based on the IoT architectural layer setup. By adopting a narrative literature review approach, this paper reviews the most recent studies whereby emerging data-driven models are proposed for IoT security analysis. The analysis covers key aspects, from data generation and collection to the classification phase. Pertinent issues are highlighted and discussed extensively, as the study focuses on stressing the successes as well as the limitations of the methodologies proposed in the reviewed articles. To ensure the credibility and reliability of the analyzed studies, a comprehensive search of the topic and research questions using multiple highly rated databases such as Web of Science, Scopus, IEEE Xplore, ACM digital library, Google Scholar, and so on was conducted. The identified peer-reviewed articles that were analyzed are recent articles that were published within the last five (5) years. The relevancy of the reviewed articles is justified based on the quality and methodology adopted, using a structured narrative synthesis approach. The paper is intended for IoT communication, application, and security researchers with the intention of building analytics and AI system solutions for IoT infrastructure deploying emerging data-driven approaches. Unlike previous review articles, this study gathers different approaches, procedures, and limitations as well as identifies research gaps on data-driven applications to IoT security studies. By doing this, the study provides a research-based resource for scholars researching IoT and general industrial control systems security.

Specifically, the novel contributions of the review paper are stated briefly as follows: (1) A detailed analysis of the most recent state-of-the-art data-driven approaches and their applicability in the IoT security domain; (2) Emerging security challenges and solutions for IoT networks based on the IoT architectural layers are extensively discussed; (3) An elaborate review of the most recent benchmark datasets and pre-processing steps, as well as trending classifier models deployed in IoT security analysis; (4) Open challenges, limitations, and research gaps of the current applications of data-driven models in IoT studies and the directions for future research in this regard.

The remainder of the paper is organized as follows. An overview of IoT network security is presented in Section 2. The section analyzes the associated security challenges attributed to the architectural layers of typical IoT networks. The various emerging security menaces are also discussed. Section 3 summarizes some emerging IoT security and privacy solutions in recent literature. Section 4 presents a detailed analysis of data-driven techniques for the IoT security menaces. The section analyzed the approach in terms of data generation and pre-processing phase as well as the classification phase. Various strengths, achievements, and limitations of these phases are extensively discussed. The research gaps and future research directions are presented in Section 5, while Section 6 presents the conclusions.

2. Overview of IoT Network Security

The global acceptance and geometric growth of the IoT paradigm across all aspects of 21st-century society is mostly due to the possibility of creating an automated and connected community between humans and their environmental devices, gadgets, equipment, etc., across various societal domains and industrial sectors. The IoT paradigm provides solutions for various domains based on the integration of information technology and communication technology (ICT) with hardware and software processes. While the integration of IoT technologies continues to cut across various fields, security concerns have been globally adjudged as the biggest threat to the domination of IoT in every space. The authors in [32] explained that attacks on IoT devices can be quite simple. By identifying a weak link (for example, vulnerable node(s)) to compromise and further initiate fraudulent acts, intruders can successfully create unwanted situations from user data breaches and/or take over critical infrastructure.

2.1. IoT Network Architectural Set-Up

The most reliable and secure convergence of the various ICT tools and technologies is built on effective IoT architecture layers. These architecture layers often vary, depending on the associated task requirements. A detailed analysis of IoT network security can be described from the IoT architectural structure. IoT architecture is defined as ‘a framework which basically specifies the physical elements, network technical arrangements, operating procedures as well as the data formats that is being deployed’ [33,34]. Generally, there are three major dominant classes of IoT architectures, as depicted in Figure 3 [27,34,35].

Three-layer architecture

Three-layer architecture is generally adjudged as the most common IoT architecture as it is quite easy to implement. As shown in Figure 3, the perception layer at the base is made up of physical hardware devices, which include sensors, Radio Frequency Identification (RFID), cameras, etc., which are equipped with sensing capability features to collect real-time information and react to physical objects’ behaviors and relay the collected data to the network layer for processing, analysis, transmission, and storage [37]. Lastly, the users interact with the applications and services at the application layer [38].

2.: Four-layer architecture
3.: Similar to the three-layer architecture, the base of the four-layer architecture is the perception/sensing layer, which includes devices such as sensors that collect data from their surroundings. The next layer is the network layer, whereby data from the sensing layer are transported, typically to the internet, via the gateway devices. The service/middleware layer handles data analysis and processing. Depending on the application, the service layer is either located in the gateway or in the cloud. Similar to the three-layer architecture, the users interact with the applications and services at the application layer [36].
4.: Five-layer architecture

With regard to the five-layer architecture, the network layer, which typically is made up of base station nodes and network access gateway, transmits the data collected from the perception layer devices to the network transmission layer, which is characterized by several network and data analytic technologies for processing, analysis, and storage. The analyzed data are often presented to the end user using the application layer technologies. At the topmost layer, the business layer tools allow users, typically system administrators, to manage and control the various associated functionalities of the IoT platforms [39,40].

Irrespective of the architectural setup, intrusion and attacks on IoT networks can be perpetrated via any layer device(s) or communication between layers. The authors in [4] categorized IoT network attacks into physical, network, and software attacks. Using the 4-layer architecture as the basis, the authors in [41] categorized various trending IoT attacks. With regard to the various layers, typical attacks on the perception layer include eavesdropping, radio frequency interference, reverse engineering, tampering, and general cyber/physical attacks [42]. Considering that some devices at the perception layers are typically exposed and vulnerable based on their location, they are easily preyed upon. The ISO/IEC 18000-63:2021 information technology—RFID standard, which is very popular among emerging perception layer devices, is well known for its apparent vulnerability. The standard is highly prone to hacks, imitation, and disabling [43]. Also, as the intermediary layers of the various architectures are typically used for processing, analysis, and storage of data collected from perception layer devices, the communication protocols and standards that enable the exchange of data between these IoT elements can be exploited for their security lapses. Devices and other distant parts of the network are integral to these processes. Typical protocols and standards such as Bluetooth Low Energy, Wi-Fi, ZigBee, Ethernet, cellular networks, and Low Power Wide Area Network standards have individual security measures in place for their operation. However, there are identified vulnerabilities that are easily exploited. For example, Bluetooth standard 5.0 uses Secure Simple Pairing (SSP) and LE Secure Connections protocols for secure key generation. This process relies on secure key distribution and Elliptic Curve Diffie-Hellman cryptography. However, the standard suffers from easy privacy/identity tracking, eavesdropping, and bluejacking. At the network layer, popular attacks include phishing, replay attacks, spoofing, DoS, etc. [42]. At the application layer, popular attacks include cross-site scripting, malware injection, Man-in-The-Middle (MiTM) attacks, sniffing, etc. By exploiting various vulnerabilities, including weak authorization and authentication protocols, malicious commands/scripts can be injected into web-based IoT dashboards or injected into an application’s input fields to alter or extract data from databases. Various authors have categorized various attacks using different criteria. The authors in [6] categorized them as internal/malicious, internal/non-malicious, external/opportunistic, and external/deliberate. In a similar study, the authors in [32] categorized IoT network attacks based on adversary location, access level, attack strategies, and information damage level.

2.2. IoT Network Popular Attacks

Irrespective of the IoT architectural setup or network scale, some of the dominant attacks on IoT networks and resources include the following:

Denial of Service (DoS)/Distributed Denial of Service (DDoS) Attacks

Arguably one of the most popular cyber-attacks, DoS/DDoS attacks attempt to overload a network to make it inaccessible to legit users or degrade its performance. Simply put, the attack type is realized by flooding the target devices with traffic in order to trigger a crash [44]. Figure 4 presents a description of a typical DoS attack, whereby an intruder creates a scenario to flood and overwhelm the network, thereby consuming available bandwidth and making it inaccessible for legitimate network users. A DDoS attack was first documented in 1999 at the University of Minnesota, whereby a computer at the institution was infiltrated from a network of 114 other computers infected with Trin00 (malicious script). The malicious script triggered the infected computers to send superfluous packets to the institution’s infrastructures, overwhelming its computers and preventing them from handling legitimate requests [45]. Since the first documented episode, these forms of attacks have increased geometrically in volume and are a constant threat to various systems, networks, organizations, and so on [46]. The report in [47] estimated the detection of 57,116 attacks by DDoS intelligence systems by the third quarter of 2022. With the increase in volume, there was an increase from 38.69% in the third quarter of 2021 to 53.53% in the third quarter of 2022. The report detailed that the majority of the attacks were perpetrated in North America, predominantly in the United States. With the volume of IoT tools being deployed globally, there is always a need to timeously detect and mitigate this form of attack.

Man-in-the-middle (MiTM) attacks

Considered one of the most discussed attacks in IoT ecosystems, the MiTM attack is a major menace for many security professionals [49]. Typically, the attack scenario involves the victims (IoT end nodes) and one or numerous intruder(s). The typical intruder capitalizes on the vulnerabilities in the communication channel to access, monitor, and manipulate data between the end nodes. Thus, the integrity and confidentiality of data exchange are compromised while evading the end nodes’ knowledge of its presence in the network [49,50]. Figure 5 presents a description of a typical MiTM attack. As shown in Figure 5, the intruders position themselves between legitimate communicating nodes with the goal of intercepting, manipulating, and stealing sensitive information. Typical tools, including GATTacker and BtleJuice, can be used to launch a MiTM attack in a Bluetooth-enabled IoT network [51]. Mallik [50] explained that the core objective of the MiTM attack is to steal IoT users’ vital information and credentials. The authors further explained that the information obtained during a typical attack scenario can be used for fraudulent activities in organizational settings. Also, such attack scenarios can be deployed in the infiltration phase of an Advanced Persistent Threat attack. A 2020 report estimated that MiTM attacks were responsible for approximately USD 2 billion in annual losses worldwide [52]. The report further estimated that MiTM attacks cover 19% of all successful cyber-attacks in 2021.

Ransomware

Globally adjudged as one of the most terrifying and growing attacks targeting IoT networks, the attackers typically attempt to encrypt the victim’s data, thereby denying legitimate users access to their data. Afterward, the attacker demands ransom for the decryption key [53,54]. Figure 6 depicts an illustration of a ransomware attack. As shown in Figure 6, the attacker successfully gains access to a legitimate user’s (victim) data, encrypts the data, and requests that a ransom be paid (typically in the form of difficult-to-trace digital currencies) before the decryption key is granted to the victim. The authors in [54] explained that cybersecurity experts, standard manufacturers, law enforcement agencies, IoT users, and enthusiasts struggle to deal with this specific type of attack. Historically, the first documented ransomware attack, tagged ‘Aids Trojan’, was introduced in 1989 by Joseph Popp, a trained biologist [55]. The attack distributed 20,000 infected floppy disk drives to the participants of the World Health Organization conference [53,54,55]. Incidents such as the 2007 locker ransomware attack that targeted Russia, the 2016 devastating mamba ransomware attacks on a municipal transport system in San Francisco, United States, as well as the 2017 mamba ransomware attack on corporate networks in Saudi Arabia [53,56], have shown the devastating effects of this attack type, and it has also shown how this specific type of attack has continued to grow into large scale attacks. Consequences of ransomware attacks include temporary and sometimes permanent loss of information, disruption of services, as well as financial losses. A 2024 report by Veeam (a United States information technology company) [57], which involved a survey of 350 chief information security officers, information security professionals, and backup administrators from a variety of organizations, estimated that 96% of ransomware attacks targeted backup repositories and about 28% of the organizations who paid the ransom were unable to recover all lost data. The report also estimated that a measly 14% were able to recover their data without paying a ransom.

Physical Damage

Considering that IoT infrastructures are ubiquitous in nature, some IoT equipment, particularly the perception layer infrastructures, are usually deployed in remote and unsecured spaces [58]. For example, emerging smart cities and smart grids are equipped with perception layer elements, including sensors, cameras, fiber and cable installations, and smart lights that are either exposed or located in rural, underground, and secluded locations that sometimes tend to be inadequately guarded. Thus, adversaries often capitalize on their vulnerabilities to access the devices and inflict physical attacks and damage to the infrastructures [59]. While there is a major focus on the cybersecurity aspect, physical security is equally important. Some physical attack variants include theft, vandalism, signal jamming, node tampering, node replication attacks, etc. These physical attack menaces typically result in disruptions in services. For example, apart from interruptions to transportation and telecommunication services, underground cable theft costs the Republic of South Africa’s economy an estimated value of ZAR 5 billion annually [60]. By stealing or tampering with IoT infrastructures, attackers can access users’ credentials and possibly inflict physical harm on users. Thus, IoT infrastructures demand adequate security measures to prevent such attacks and reduce the impact of unexpected equipment failures on networks.

Replay Attack

This type of attack usually takes place during synchronization. The goal is to mislead the destination node such that a malicious node stores transmitted information only to retransmit it later [61]. Gargoum [62] explained that this specific form of attack involves capturing communication signals between components in an industrial control system and replaying them later to disrupt network operations. A missed frames retransmission request is usually made by transmitting data packets across a network in a repeated manner. This attack often exhausts network/system resources and back-end database resources (memory, battery, and processor). Replay attacks are classified as high-risk attacks, but they can be mitigated and prevented relatively easily. Other notable attacks that threaten the security of IoT networks include Sinkhole, Sybil, etc.

3. IoT Network Security and Privacy Solutions

Security and privacy concerns have continued to be significant issues for all stakeholders and a major concern for increased domination and adoption of IoT technologies [63]. With applications cutting across all fields of life and critical infrastructures, elevated security concerns are considered the biggest threat to the paradigm [4]. As explained in [42], without an effective system in place whereby various devices are able to share and exchange data in a private and secure manner, users may shy away from the technologies as there will be no guarantee that their personal data will be kept secured.

As with conventional information security setups, requirements including confidentiality, integrity and availability, authentication, authorization, and access control also apply to IoT security. Ensuring that IoT data are accurate and protected and that they are only used as intended, as well as ensuring that IoT data and devices are accessible when needed, is vital for the long-term goals of IoT network infrastructures. This fundamental security provision for IoT devices and networks combines both software-based security measures with hardware-based security features. Thus, irrespective of the IoT architectural setup, the hardware devices (i.e., physical objects and devices, cloud gateways, etc.), network-based infrastructures (i.e., cloud back-end, standards, protocols, etc.), and all application-based gadgets (i.e., application, firmware, interfaces, etc.) must be well secured.

In recent literature, various studies have analyzed and proposed various security solutions. Some of these proposed solutions focus on key management and encryption systems, blockchain technology, and intelligence intrusion detection systems (IDS). HaddadPajouh et al. [27] explained that IoT network layer environments should focus on (a) implementing appropriate (lightweight) encryption systems to maintain conventionality and integrity of data, (b) installing good tracing parameters for authorizing the transferred packet data, and (c) implementing appropriate threat hunting modules for detecting threats. Conventional communication and routing protocols, such as TLS and DTLS, IPSec, etc., are encouraged to provide secure communication and routing. Various encryption algorithms and associated methodologies are proposed and deployed in various standards to provide and maintain IoT technology’s end-to-end communication security [64,65,66]. Considering the enviable characteristics, including decentralized architecture, immutability, and distributed ledger nature of blockchain technology, it is a viable solution for secure data storage, effective access control, and protection against typical attacks, including resistance to DoS attacks. Thus, blockchain technology has been proposed and deployed to provide and maintain IoT technology’s end-to-end security [67,68,69,70]. Using traditional security measures often lacks flexibility and adaptability to specific IoT contexts and emerging IoT characteristics. Considering the limited energy storage and compactness of IoT devices (based on various factors including architectural design and ease of use, etc.), securing them against various threats using conventional security methodologies tends to incur additional resource demands, including extensive computation, storage, and energy resources.

As IDSs are widely recognized as an effective security methodology for information technology networks, numerous statistical formulations have been proposed for IoT security analysis and assessment. However, the majority of the statistical formulations are implicitly rigid and computationally expensive for emerging IoT setup scenarios. Also, considering factors such as the huge data associated with IoT end devices, the uncertainties with sensing and other measurement devices, as well as the computational complexities of emerging IoT applications, etc., the limitations of traditional security methodologies have been consistently exposed. Hence, the need for proactive, robust, prompt, and reliable methodologies has become essential, particularly in the face of relentless sophisticated attacks. As explained by Naha et al. [71], the need to urgently detect attack scenarios as early as possible is vital to reduce the magnitude of potential damages.

In recent times, data-driven AI models have proved extremely efficient in various applications and studies. These models are widely proposed in IoT security analysis involving the monitoring, prediction, detection, and classification of various security menaces. Using various AI subsets, the complex and diverse network data stream can be analyzed by extracting features that contain hidden trends, behaviors, and patterns. The models are capable of effectively analyzing and evaluating the features of embedded intelligence in IoT devices and networks. For example, they are capable of detecting malignant codes from an application or software within IoT networks. In real-world situations, Google utilizes machine learning extensively to protect mail users from spam. Using neural network models, Google analyzes billions of email data and identifies patterns synonymous with spam [72]. Similarly, Amazon, through its Amazon GuardDuty, leverages machine learning models to detect anomalous activity and potential threats across accounts, workloads, and data, providing real-time security monitoring and threat detection [73].

4. Performance Evaluation and Comparative Analysis of Data-Driven AI Models for IoT Network Security and Privacy Menaces

Protecting emerging IoT network infrastructures against sophisticated attacks using conventional security solutions is becoming increasingly challenging. The overwhelming quantity of data generated, with their diverse features, has contributed largely to the solutions’ poor performances. Data-driven AI models, including machine learning, deep learning, and reinforcement learning models, are proving to be viable options as they possess various characteristics, including high performance and high processing speed. These models are widely proposed in the IoT literature for monitoring, intrusion detection, prediction, and classification of security challenges that conventional security methodologies, such as whitelisting, encryption models, authentication, firewalls, etc., are incapacitated in mitigating. Xiao et al. [74] investigated IoT systems attack models and reviewed security solutions that are based on various machine learning models. Aldahiri et al. [75] reviewed various machine learning approaches and their application in IoT medical data. Bhayo et al. [76] explored key machine learning models, including naive Bayes (NB), decision tree (DT), and support vector machine (SVM) algorithms for detecting and classifying DDoS attacks in IoT network packets. Using benchmark IoT datasets, Islam et al. [77] experimented using shallow learners, including DT, SVM, and random forest (RF), as well as deep learning model variants, including deep belief network and long short-term memory, for detecting IoT network attacks.

From the conventional point of view, the methodology approach involving the deployment of data-driven learning models for IoT security studies is generally divided into two main phases: (1) the data collection and preprocessing phase and (2) the analysis/classification phase. In some literature studies, the authors deployed various optimization techniques, while some deployed an ensemble of multiple algorithms to boost classification performances.

4.1. IoT Network Data Generation and Pre-Processing Phase

As explained in [10], the core component for utilizing data-driven models in IoT security studies is the dataset/deployed testbed. Basically, IoT network security studies using data-driven learning models involve the capturing and analysis of sufficient and balanced IoT network data traffic. Furthermore, for an intruder to compromise an IoT network or general information system, it is expected that the intruder will somehow create/leave some form of footprint or disruption. In order to have a sufficiently balanced dataset for the algorithms’ learning process and analysis, some intrusions, including emerging cyberattacks, are usually integrated into the testbeds. However, due to the limited available IoT datasets, researchers usually turn to the use of publicly available datasets for IoT security analysis. Some notable datasets that have been used in recent studies involving IoT network security, anomaly detection, and malware analysis are discussed in terms of their characteristics, strengths, and limitations.

IoT-23 Dataset

The IoT-23 dataset is a recently published dataset (captured between 2018 and 2019) using typical smart home devices, including the popular Amazon Echo home device, smart LED lamp, and a door lock. The dataset was generated at the Avast AIC laboratory and the stratosphere lab team at the Czech Technical University [78]. The dataset contains an estimated 325 million labeled network traffic flows. With regard to data structure, the dataset has 20 malware captures executed in IoT devices and 3 captures for benign IoT device traffic. The dataset has been heavily deployed in various IoT network traffic analysis, anomaly detection, and classification studies [79,80,81,82,83,84].

Despite its popularity in recent studies, the IoT-23 dataset’s imbalanced class issue is a major concern that often makes the learning process biased and ineffective [79,82]. Some of the reviewed studies explored oversampling and undersampling techniques to solve the class imbalance issues of the dataset. With the undersampling technique, samples from the dataset’s majority class are removed to balance out the dataset, while oversampling involves taking multiple samples of the dataset and averaging them [85,86]. In order to mitigate the class imbalance concern, the authors in [79] deployed various preprocessing techniques, which include Synthetic Minority Oversampling (SMOTE), Adaptive Synthetic Sampling (ADASYN), cost-sensitive learning, and bagging techniques to mitigate the issue. SMOTE, as defined in (1) [87,88], is a popular algorithm used to generate synthetic samples for the minority class in imbalanced datasets.

X_{n e w} = X_{i} + λ \times (X_{j} - X_{i})

(1)

where

X_{n e w}

is the new synthetic sample,

X_{i}

is the randomly chosen minority class sample,

X_{j}

is another randomly chosen minority class sample that is one of the k-nearest neighbors of

X_{i}

and

λ

is a random number in the range [0, 1], ensuring the synthetic point lies on the line segment between

X_{i}

and

X_{j}

. On the other hand, ADASYN is an improvement on SMOTE as it generates synthetic data adaptively by focusing on sparsely distributed minority class samples. Thus, ADASYN introduces an adaptive weighting mechanism to (1), which is defined in (2) [87,89].

G_{i} = \overset{\land}{G} . r

(2)

where

G_{i}

is the number of synthetic samples for minority sample

X_{i}

,

\overset{\land}{G}

is the total number of synthetic samples to be generated, and

r_{i}

, defined in (3) is the normalized difficulty level of the sample.

r_{i} = \frac{Δ_{i}}{\sum Δ_{i}}

(3)

where

Δ_{i}

, defined in (4), measures how hard-to-learn the sample is.

Δ_{i} = \frac{n u m b e r o f m a j o r i t y c l a s s n e i g h b o u r}{k}

(4)

Similarly, El-Hariri et al. [90] explored ADASYN, SMOTE, and undersampling for the same purpose. In the study, the authors also argued that ADASYN is a better technique compared to SMOTE or random undersampling based on the results achieved. The authors explained that ADASYN involves the generation of synthetic data, which often lacks noise, for the minority classes. Furthermore, the authors explained that the ADASYN approach provides better flexibility and control during the synthetic sample generation processes, thus allowing users to customize the sampling procedures based on the specific needs of the dataset. In a similar study, Nicolas–Alin [91] deployed statistical correlation to eliminate data that are not related to the label column. Specifically, the authors experimented with a correlation matrix for each of the 33 data files in the IoT-23 data and averaged the results. In another related study, Abdalgawad et al. [80] deployed various preprocessing steps, including feature selection, encoding, etc., on the popular dataset. Nanthiya et al. [81] deployed principal component analysis (PCA) to select relevant features in the dataset. The study analyzed and compared the use of the dataset with and without the use of PCA as the feature selection tool.

TON_IoT Dataset

Similar to IoT-23, the TON_IoT data is a recently published dataset that was generated from a large-scale, heterogeneous IoT network setup, which consists of telemetry data from sensors, Linux operating system (OS) records, Windows OS records, and network traffic [92,93,94,95]. The dataset was conceptualized and generated at the Cyber Range and IoT Labs, the School of Engineering and Information Technology (SEIT), University of New South Wales (UNSW) Canberra Cyber Range Laboratory, Australia. In terms of content, the dataset is made up of both normal and different types of modern-day attacks, including ransomware, password attack, scanning, DoS, DDoS, data injection, backdoor, cross-site scripting (XSS), as well as MiTM. It is being deployed in various recent IoT intrusion detection and security analysis studies [93,94,95,96,97,98,99].

Similar to the IoT-23 dataset, the TON_IoT dataset is also known for its class imbalance concerns [100,101]. Furthermore, the dataset has other challenges, including several missing values, and it is generally adjudged to contain a large volume of irrelevant features [98,99,100,101,102]. These issues make the pre-processing steps highly critical in the effective use of the dataset. Zhong et al. [100] addressed the class imbalance issue by combining SMOTE with a redundant-based Tomek link removal technique. The authors in [99] deployed the popular chi-square (Chi²) technique for selecting important features in the dataset, while SMOTE was deployed for the class imbalance issue. In a related study, Gad et al. [101] also used Chi² to reduce the data features during the pre-processing phase. Justifying the choice of the Chi² technique as an efficient feature selection tool, Sarhan et al. [93] explained that the technique is capable of effectively measuring the independence of data features with their respective class labels. Using a comparative approach for feature selection, Guo et al. [98] compared the use of Chi² and Spearman rank correlation coefficient models for selecting relevant features in the TON_IOT dataset. Similarly, the authors in [93] experimented with various models, including Chi² and information gain (IG), to analyze and rank the TON_IOT data features. Guo et al. [96] also compared five key feature selection methods, namely IG, gain ratio (GR), Chi², Pearson correlation coefficient PCC, and symmetric uncertainty, to select dominant features in the popular dataset.

Bot-IoT Dataset

The Bot-IoT dataset was created in 2018 and was published in 2019 for the analysis of botnet attacks in IoT networks [103,104]. Similar to the TON_IOT dataset, the Bot-IoT dataset is generated by the UNSW Canberra Cyber Range Laboratory. The dataset was conceptualized in a testbed environment that consisted of multiple virtual machines with various OSs, network firewalls, network taps, Node-red tool, and the Argus network security tool. The dataset includes a variety of trending attacks, namely DDoS, DoS, OS and service scan, keylogging, and data exfiltration attacks [105]. The dataset contains approximately 70 million records.

The dataset has been deployed in various intrusion detection, attack analysis, and general cybersecurity studies [86,106,107,108,109,110]. Pokhrel et al. [107] used the dataset to analyze DDoS attacks in IoT networks. Leevy et al. [109] also deployed the dataset in the analysis of information theft. Similarly, Jayalaxmi et al. [111] deployed the dataset for the analysis of bots in a general IIoT network. Similar to several other modern-day benchmark datasets, the Bot-IoT dataset is known for its class imbalance issues [107,112,113]. The authors in [107,112] proposed SMOTE for solving the dataset’s class imbalance issue.

Also, considering that the original data are voluminous (70 million records), most studies typically deploy some form of pre-processing step(s) to ensure the needed effectiveness from the data-driven models. Apart from the basic conventional pre-processing steps, which include normalization, numeralization, etc., various authors deployed other preprocessing methods, including feature selection models such as search algorithms. Shafiq et al. [106] proposed a feature selection metric approach, which is based on a wrapper technique to filter and select relevant features for the learning model. To simplify the data-driven model tasks, Leevy et al. [114] deployed a fraction of the entire dataset for analysis. Specifically, the authors used 3 out of the 29 Bot-IoT features for their IoT intrusion analysis. In a similar study, the authors of [109] used the variant of the original dataset by experimenting with about 5% of the original instances to analyze information theft in IoT systems.

IoTID20 Dataset

The IoTID20 dataset, created in 2020, was generated by modeling a testbed that fits the description of a typical smart home. The setup includes conventional IoT devices, which include AI speakers, WiFi cameras, laptops, smartphones, access points, and routers [115]. From the setup, the cameras and speakers were deemed the victim devices, while the remaining devices were dubbed the attacking devices. The popular tool ‘Nmap tool’ was deployed to simulate a variety of trending attack scenarios. The IoTID dataset has 86 features with 625,784 instances. Based on its relevancy to the field, the dataset has been deployed in various IoT intrusion analyses and cybersecurity research [1,116,117,118,119,120]. The authors of [121] used the dataset and similar datasets to analyze DoS/ DDoS and Sybil attacks in the fog-IoT model. Similarly, Shin et al. [122] used the dataset to analyze general network intrusions.

Similar to several other modern-day benchmark open-source datasets, the dataset is well known for its class imbalance issues [90,120,122]. To achieve effective results, Al-Akhras et al. [116] proposed a Repeated Edited Nearest Neighbor (RENN), Encoding Length (Explore), and Decremental Reduction Optimization Procedure 5 (DROP5) algorithm to filter out noise in the dataset before deploying the SMOTE technique to deal with the imbalance issue. The authors further experimented with particle swarm optimization, grey wolf optimizer, and multi-verse optimizer to remove redundant data features. Apart from using SMOTE to address the class imbalance issue, the authors of [120] deployed typical pre-processing steps, including data cleaning, encoding, and normalization, as well as feature reduction using correlation analysis. To address the class imbalance issue, El Hariri et al. [90] deployed a comparison approach involving different data balancing techniques, including SMOTE, ADASYN, and random undersampling on the dataset.

Some of the other highly rated datasets that are deployed in recent IoT security studies to analyze and evaluate the performance of data-driven AI models in intrusion analysis and detection models include the UNSW-NB15 dataset [86,109,121], N-BaIoT dataset [116,123,124,125], CICIDS2018 [126,127,128], and CICIDS2017 [129,130,131].

As shown in Table 2, the majority of the benchmark datasets incorporated emerging cyberattacks in the data generation phase. Also, as shown in Table 2, the datasets’ generation setups involve predominantly small-scale home automation devices and some form of simulations. Thus, it can be argued that simulations may not depict real-world situations and scenarios. Also, it can be argued that the datasets may not be fully applicable for IIoT intrusion analysis as the generation processes are from small-scale smart home setups. Furthermore, the class imbalance issue is a general problem among the benchmark datasets.

4.2. IoT Network Data Analysis/ Classification Phase

With generated datasets containing millions of instances with numerous features, solutions in the form of automation models, specifically AI algorithms, are being considered viable options to monitor, identify, and respond to intrusions as they do not require the need for constant human supervision [132]. With the FortiGuard AI-powered security services, top cybersecurity firm Fortinet develops and utilizes machine learning and AI technologies to provide timely and consistent protection and actionable threat intelligence [133]. Various traditional machine learning techniques such as SVM [77,91,118], DT [99,116,117,118], KNN [99,107], etc., as well as various deep learning models [80,100,134,135,136] and reinforcement learning models [137,138,139,140,141], have been proposed for IoT network intrusion monitoring, prediction, and detection in recent studies.

While conventional machine learning models, such as DT, SVM, etc., have shown good performance in handling the task, they are considered rather inefficient, particularly in terms of their capacity to generalize in complex models with changing features such as IoT network environments. Ullah et al. [142] proposed a transformer neural network-based model for detecting intrusions in MQTT-enabled IoT networks. The developed model leveraged the parallel processing capability of the transformer neural network for improved and prompt detection of attacks. Similarly, the authors of [143] proposed a transformer model for detecting intrusions based on its well-known adaptability and reliability. The authors evaluated the developed model on the NSL-KDD as well as the UNSW-NB15 datasets.

Traditionally, it would be expected that most IoT IDS models are binary classification as it logically means either the network has intrusive elements or not. However, with the level of extensive research on IoT networks, various forms of emerging attacks and intrusion scenarios are being considered in the data generation processes. Thus, the majority of the classification models are multiclass rather than binary (and, in some cases, both). In a study that involved both binary and multi-label classes, Ahli et al. [84] modeled various classifiers, including neural networks, for detecting malicious traffic flows in the popular Aposemat IoT-23 dataset. In a similar study, the authors of [134] deployed a convolutional neural network for a multiclass classification task using four benchmark IoT datasets, namely Bot-IoT, IoT network intrusion, MQTT-IoT-IDS2020, and IoT-23 intrusion detection datasets.

While some authors deployed single models, several recent studies explored a comparison approach, which involves the comparison of different learners on a specific dataset(s), thereby showing how each of the learners performs on the different dataset’s features and characteristics. Khanday et al. [144] modeled several classifiers, including SVM, Gaussian Naïve Bayes, and AdaBoost, for the binary and multiple classifications of IoT attacks. Using the IoT-23 dataset, the authors achieved the best result from the Gaussian Naive Bayes. Vitorino et al. [145] performed an analysis involving the comparison of supervised, unsupervised, and reinforcement learning models on a modified version of the IoT-23 dataset. The study considered both binary and multi-class classes. Specifically, the authors developed models that include SVM, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), isolation forest (iForest), local outlier factor (LOF), and double deep Q-network (DDQIN) for experimentation. Using popular metrics, which include accuracy, F1 score, AUC, etc., Zhang et al. [146] compared the performance of DT, Naïve Bayes, SVM, RF, XGBoost, convolutional neural networks (CNN), and recurrent neural networks (RNN) on the KDD CUP99 and NSL-KDD datasets.

To reduce uncertainty and improve both robustness and predictive performances, several authors have explored the combination of multiple models for the security analysis of IoT networks. Using the ToN_IoT dataset for evaluation, Mousa’B et al. [102] modeled CNN models and extreme learning machines (ELM) models for IoT network security analysis. De Souza et al. [147] experimented with the combination of RF, extra tree (ET), and deep neural network for the analysis of Bot-IoT, IoTID20, NSL-KDD, and CICIDS2018 datasets. Leevy et al. [109] compared the performance of a group of ensemble classifiers (CatBoost, Light-GBM, XGBoost, and RF) and four non-ensembles (DT, logistic regression, Naive Bayes, and a multilayer perceptron (MLP)) for detecting information theft attacks using the BoT-IoT dataset. Kabir et al. [148] compared the stacking models of XGBoost, KNN, and RF and the stacking of XGBoost, NN, KNN, and RF with an ET classifier with mutual information gain feature selection method for the analysis of the UNSW-NB15 packet-based dataset. In [97], the authors explored various ensembles involving RF, ET, and XGB using logistic regression as the meta-classifier for the analysis of the TON_IoT dataset. Popular metrics include accuracy, F-score, recall, precision, receiver operating characteristic-AUC (ROC-AUC), computational time, etc., that are used for evaluating the performance of the various learning models. Accuracy, precision, recall, and F-score are defined in (5), (6), (7), and (8), respectively [11,122,149]. These metrics have been extensively discussed in various studies in the literature [4,8,11,12].

A c c = \frac{t o t a l c o r r e c t l y c l a s s i f i e d s a m p l e s}{t o t a l d a t a s e t s a m p l e s}

(5)

\Pr e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(6)

Re c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(7)

F - s c o r e = \frac{2 \times \Pr e c i s i o n \times Re c a l l}{\Pr e c i s i o n + Re c a l l}

(8)

where True Positive refers to the total number of correctly classified intrusive event instances, False Positive refers to the total number of normal events instances that are wrongly classified and False Negative refers to the rate of false negative observations.

The summary of some recently published articles whereby data-driven models were deployed for analyzing and classifying IoT network security menaces is presented in Table 3. Table 3 presents a summarized description of the contributions of the studies and the dataset/test system employed, as well as the learning model(s) used in the security assessment studies.

As shown in Table 3, the deployment of data-driven AI models for IoT network intrusion analysis studies has been relatively successful. As shown in Table 3, the results achieved from the various models depict that the attacks integrated into the various testbeds are reliably detected or classified. However, as the majority of the studies rely heavily on simulated testbeds and open-source datasets (some of which involve simulated attack integrations), there is no real guarantee that the proposed models can be reliably effective in real-life systems.

5. Research Gaps and Future Directions

Regardless of the exceptional performances achieved from the applications of data-driven AI models for the security analysis of IoT networks in literature studies, there are several identified challenges and research gaps that are still unsolved. Some of these research gaps were identified from the analysis of the methodologies adopted in the reviewed articles. The key research gaps that were identified from the recently published articles are discussed briefly.

The performance of the learning model(s) heavily depends on the heterogeneity and qualitative and quantitative properties of the dataset and test systems deployed. The heterogeneous, dynamic, and diverse nature of typical real-life IoT networks basically demands that the security analysis in this regard deploy typical IoT network setups that mimic real-life scenarios. However, due to the non-availability and inadequacy of data from real IoT networks, researchers have consistently turned to the use of simulated datasets, often outdated open-source datasets and/or scalable testbed developments, which often lack the characteristics of emerging real-life IoT networks. Thus, they have often shown inconsistency in the data-driven models’ learning processes.
The majority of these benchmark datasets have class imbalance and noisy and missing data issues, which can negatively impact the performances of the learning models. These issues typically arise from the data generation phase, as IoT end devices can often produce noisy, incomplete, or inconsistent data. For example, a location determination technology such as sensor readings may be inconsistent due to environmental factors or technical issues, leading to erroneous data.
Apart from the number of input dataset(s), another important factor is the pre-processing steps. Various studies have explored varieties of models and algorithms to pre-process the data with the sole aim of achieving better performance(s) from the learning model. While these methods have been quite effective, most often, the proposed models tend to be rigorous, leading to high computational complexity of the entire process. Thus, there is a need for effective yet scalable solutions. Additionally, the tuning of the parameters involved in achieving desired results often requires expertise and makes the process time-consuming.
The black-box nature of emerging learning models, particularly deep learning models, whereby their decision-making process is not easily interpretable and transparent for general users. In a security context, it is crucial to understand why a model classifies something as an attack or normal behavior, as this can help security experts diagnose and respond to threats more effectively. Future research can also explore federated learning, whereby the training of a model is distributed across many decentralized devices or servers rather than relying on a centralized server to gather and store all the data. It allows multiple devices or clients to collaboratively train a shared model while keeping the data on local devices (such as smartphones, edge devices, or IoT devices), thereby preserving privacy and reducing the need for data transfer.

6. Conclusions

The goal of improving the quality of life, enhancing productivity, and unlocking new possibilities across homes, cities, applications, and industries has contributed to the geometric growth of IoT globally. As IoT technologies continue to grow, the issue of security and privacy across the different interconnected devices and systems poses significant challenges to its global dominancy and users’ acceptance. This paper presents a comprehensive review of the most recent studies whereby data-driven learning models were developed for IoT security analysis. The study highlights emerging IoT security and privacy issues, current conventional strategies and methodology procedures for mitigating the security challenges, as well as the significant progress, achievements, and, more importantly, the limitations and research gaps identified in the data-driven learning models’ applications to recent IoT security studies in the literature. By doing this, the study provides a detailed research-based resource for researchers working on IoT and general industrial control systems security. Furthermore, research gaps and recommendations for future research works are presented.

Funding

This study received no external funding.

Conflicts of Interest

The author declares no conflicts of interest.

References

Alzubi, O.A.; Alzubi, J.A.; Qiqieh, I.; Al-Zoubi, A.M. An IoT Intrusion Detection Approach Based on Salp Swarm and Artificial Neural Network. Int. J. Netw. Manag. 2025, 35, e2296. [Google Scholar] [CrossRef]
Alaba, F.A.; Othman, M.; Hashem, I.A.T.; Alotaibi, F. Internet of Things security: A survey. J. Netw. Comput. Appl. 2017, 88, 10–28. [Google Scholar] [CrossRef]
Tu, T.F.; Qin, J.W.; Zhang, H.; Chen, M.; Xu, T.; Huang, Y. A comprehensive study of Mozi botnet. Int. J. Intell. Syst. 2022, 37, 6877–6908. [Google Scholar] [CrossRef]
Alimi, O.A. Hybrid Data-Driven Learning-Based Internet of Things Network Intrusion Detection Model. In Proceedings of the 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 29–31 May 2024; pp. 496–501. [Google Scholar]
Duguma, A.L.; Bai, X. How the internet of things technology improves agricultural efficiency. Artif. Intell. Rev. 2024, 58, 1–26. [Google Scholar] [CrossRef]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M. A review of machine learning approaches to power system security and stability. IEEE Access 2020, 8, 113512–113531. [Google Scholar] [CrossRef]
Margolis, J.; Oh, T.T.; Jadhav, S.; Kim, Y.H.; Kim, J.N. An in-depth analysis of the mirai botnet. In Proceedings of the International Conference on Software Security and Assurance (ICSSA), Altoona, PA, USA, 24–25 July 2017; pp. 6–12. [Google Scholar]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Rimer, S.; Alimi, K.O.A. Supervised learning based intrusion detection for SCADA systems. In Proceedings of the 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), Abuja, Nigeria, 5–7 April 2022; pp. 1–5. [Google Scholar]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Rimer, S.; Alimi, K.O.A. Intrusion Detection for Water Distribution Systems based on an Hybrid Particle Swarm Optimization with Back Propagation Neural Network. In Proceedings of the 2021 IEEE AFRICON, Virtual, 13–15 September 2021; pp. 1–5. [Google Scholar]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Rimer, S.; Alimi, K.O.A. A review of research works on supervised learning algorithms for scada intrusion detection and classification. Sustainability 2021, 13, 9597. [Google Scholar] [CrossRef]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Rimer, S. Power system events classification using genetic algorithm based feature weighting technique for support vector machine. Heliyon 2021, 7, e05936. [Google Scholar] [CrossRef]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Alimi, K.O.A. Empirical comparison of machine learning algorithms for mitigating power systems intrusion attacks. In Proceedings of the International Symposium on Networks, Computers and Communications (ISNCC), Montreal, QC, Canada, 20–22 October 2020; pp. 1–5. [Google Scholar]
Ahmid, M.; Kazar, O. A Comprehensive Review of the Internet of Things Security. J. Appl. Secur. Res. 2023, 18, 289–305. [Google Scholar] [CrossRef]
Randolph, K.; Hunt, M. Verkada: March 9, 2021 Security Incident Report. Available online: https://docs.verkada.com/docs/Security_Incident_Report_Version1.2.pdf (accessed on 24 April 2025).
Resul, D.; Gündüz, M.Z. Analysis of cyber-attacks in IoT-based critical infrastructures. Int. J. Inf. Secur. Sci. 2019, 8, 122–133. [Google Scholar]
FDA In Brief: FDA Warns Patients, Providers about Cybersecurity Concerns with Certain Medtronic Implantable Cardiac Devices Food and Drug Administration. 2018. Available online: https://www.fda.gov/news-events/fda-brief/fda-brief-fda-warns-patients-providers-about-cybersecurity-concerns-certain-medtronic-implantable (accessed on 7 March 2025).
Hassija, V.; Chamola, V.; Saxena, V.; Jain, D.; Goyal, P.; Sikdar, B. A survey on IoT security: Application areas, security threats, and solution architectures. IEEE Access 2019, 7, 82721–82743. [Google Scholar] [CrossRef]
Ahmad, I.; Niazy, M.S.; Ziar, R.A.; Khan, S. Survey on IoT: Security threats and applications. J. Robot. Control. (JRC) 2021, 2, 42–46. [Google Scholar] [CrossRef]
Hwang, Y.H. IoT security & privacy: Threats and challenges. In Proceedings of the 1st ACM Workshop on IoT Privacy, Trust, and Security, Singapore, 14 April–14 March 2015; p. 1. [Google Scholar]
Schiller, E.; Aidoo, A.; Fuhrer, J.; Stahl, J.; Ziörjen, M.; Stiller, B. Landscape of IoT security. Comput. Sci. Rev. 2022, 44, 100467. [Google Scholar] [CrossRef]
Kamalov, F.; Pourghebleh, B.; Gheisari, M.; Liu, Y.; Moussa, S. Internet of medical things privacy and security: Challenges, solutions, and future trends from a new perspective. Sustainability 2023, 15, 3317. [Google Scholar] [CrossRef]
WEF, The Global Risks Report 2019. Available online: https://www.weforum.org/publications/the-global-risks-report-2019/ (accessed on 6 March 2025).
Mckinsey and Company. What is Cybersecurity? Available online: https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cybersecurity (accessed on 6 March 2025).
Zoting, S.; Shivarkar, A. IoT Security Market Size, Share, And Trends 2025 to 2034. Available online: https://www.precedenceresearch.com/iot-security-market (accessed on 6 March 2025).
Khan, A.R.; Kashif, M.; Jhaveri, R.H.; Raut, R.; Saba, T.; Bahaj, S.A. Deep learning for intrusion detection and security of Internet of things (IoT): Current analysis, challenges, and possible solutions. Secur. Commun. Netw. 2022, 2022, 4016073. [Google Scholar] [CrossRef]
David, R.; John, G.; John, R. Data Age 2025: The Evolution of Data To Life-Critical Don’t Focus On Big Data; Focus On The Data That’s Big, Age, D. 2025. The Evolution of Data to Life-Critical. Don’t Focus on Big Data; Focus on the Data That’s Big. Available online: https://www.seagate.com/files/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf (accessed on 6 March 2025).
HaddadPajouh, H.; Dehghantanha, A.; Parizi, R.M.; Aledhari, M.; Karimipour, H. A survey on internet of things security: Requirements, challenges, and solutions. Internet Things 2019, 14, 100129. [Google Scholar] [CrossRef]
Kouicem, D.E.; Bouabdallah, A.; Lakhlef, H. Internet of things security: A top-down survey. Comput. Netw. 2018, 141, 199–221. [Google Scholar] [CrossRef]
Yang, W.; Wang, S.; Sahri, N.M.; Karie, N.M.; Ahmed, M.; Valli, C. Biometrics for internet-of-things security: A review. Sensors 2021, 21, 6163. [Google Scholar] [CrossRef]
Sadhu, P.K.; Yanambaka, V.P.; Abdelgawad, A. Internet of things: Security and solutions survey. Sensors 2022, 22, 7433. [Google Scholar] [CrossRef]
Tange, K.; De Donno, M.; Fafoutis, X.; Dragoni, N. A systematic survey of industrial internet of things security: Requirements and fog computing opportunities. IEEE Commun. Surv. Tutorials 2020, 22, 2489–2520. [Google Scholar] [CrossRef]
Hossain, M.M.; Fotouhi, M.; Hasan, R. Towards an analysis of security issues, challenges, and open problems in the internet of things. In Proceedings of the 2015 IEEE World Congress on Services, New York, NY, USA, 27 June–2 July 2025; pp. 21–28. [Google Scholar]
Jamali, M.J.; Bahrami, B.; Heidari, A.; Allahverdizadeh, P.; Norouzi, F. Towards the Internet of Things: Architectures, Security, and Applications; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Giri, A.; Dutta, S.; Neogy, S.; Dahal, K.; Pervez, Z. Internet of Things (IoT) a survey on architecture, enabling technologies, applications and challenges. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning, New York, NY, USA, 17–18 October 2017; pp. 1–12. [Google Scholar]
Yıldırım, M.; Demiroğlu, U.; Şenol, B. An in-depth exam of IoT, IoT core components, IoT layers, and attack types. Avrupa Bilim Ve Teknol. Derg. 2021, 28, 665–669. [Google Scholar]
Patnaik, R.; Padhy, N.; Raju, K.S. A systematic survey on IoT security issues, vulnerability and open challenges. In Proceedings of the Intelligent System Design: INDIA 2019; Springer: Singapore, 2020; pp. 723–730. [Google Scholar]
Darwish, D. Improved layered architecture for Internet of Things. Intl. J. Comput. Acad. Res. (IJCAR) 2015, 4, 214–223. [Google Scholar]
Kumar, N.M.; Mallick, P.K. The Internet of Things: Insights into the building blocks, component interactions, and architecture layers. Procedia Comput. Sci. 2018, 132, 109–117. [Google Scholar] [CrossRef]
Zhong, C.; Zhu, Z.; Huang, R. Study on the IOT architecture and gateway technology. In Proceedings of the 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Guiyang, China, 18–24 August 2015; pp. 196–199. [Google Scholar]
Antao, L.; Pinto, R.; Reis, J.; Gonçalves, G. Requirements for testing and validating the industrial internet of things. In Proceedings of the 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Västerås, Sweden, 9–13 April 2018; pp. 110–115. [Google Scholar]
Mishra, N.; Pandya, S. Internet of things applications, security challenges, attacks, intrusion detection, and future visions: A systematic review. IEEE Access 2021, 9, 59353–59377. [Google Scholar] [CrossRef]
Liang, X.; Kim, Y. A survey on security attacks and solutions in the IoT network. In Proceedings of the 2021 IEEE 11th annual computing and communication workshop and conference (CCWC), Virtual, 27–30 January 2021; p. 853. [Google Scholar]
Mrabet, H.; Belguith, S.; Alhomoud, A.; Jemai, A. A Survey of IoT security based on a layered architecture of sensing and data analysis. Sensors 2020, 20, 3625. [Google Scholar] [CrossRef] [PubMed]
Liang, L.; Zheng, K.; Sheng, Q.; Huang, X. A denial of service attack method for an iot system. In Proceedings of the 8th international conference on Information Technology in Medicine and Education (ITME), Fuzhou, China, 23–25 December 2016; pp. 360–364. [Google Scholar]
MIT Technology Review. The First DDoS Attack was 20 Years Ago. This is What we’ve Learned Since. Available online: https://www.technologyreview.com/2019/04/18/103186/the-first-ddos-attack-was-20-years-ago-this-is-what-weve-learned-since/ (accessed on 6 March 2025).
Salim, M.M.; Rathore, S.; Park, J.H. Distributed denial of service attacks and its defenses in IoT: A survey. J. Supercomput. 2020, 76, 5320–5363. [Google Scholar] [CrossRef]
Kupreev, O.; Gutnikov, A.; Shmelev, Y. DDoS Attacks in Q3 2022. Available online: https://stormwall.network/resources/blog/ddos-attacks-report-q3-2022?utm_source=google.com&utm_medium=organic&utm_campaign=google.com&utm_referrer=google.com (accessed on 6 March 2025).
DoS Attack (Denial of Service). Available online: https://www.wallarm.com/what/dos-denial-of-service-attack (accessed on 6 March 2025).
Bhushan, B.; Sahoo, G.; Rai, A.K. Man-in-the-middle attack in wireless and computer networking—A review. In Proceedings of the 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA)(Fall), Dehradun, India, 15–16 September 2017; pp. 1–6. [Google Scholar]
Mallik, A. MAN-in-the-middle-attack: Understanding in simple words. Cyberspace J. Pendidik. Teknol. Inf. 2019, 2, 109–134. [Google Scholar] [CrossRef]
Salem, O.; Alsubhi, K.; Shaafi, A.; Gheryani, M.; Mehaoua, A.; Boutaba, R. Man-in-the-middle attack mitigation in internet of medical things. IEEE Trans. Ind. Informatics 2021, 18, 2053–2062. [Google Scholar] [CrossRef]
Palatty, N.J. How Many Cyber Attacks Per Day: The Latest Stats and Impacts in 2024. Available online: https://www.getastra.com/blog/security-audit/how-many-cyber-attacks-per-day/#:~:text=Man%2DIn%2DThe%2DMiddle,login%20credentials%20and%20banking%20information. (accessed on 6 March 2025).
Humayun, M.; Jhanjhi, N.Z.; Alsayat, A.; Ponnusamy, V. Internet of things and ransomware: Evolution, mitigation and prevention. Egypt. Inform. J. 2021, 22, 105–117. [Google Scholar] [CrossRef]
Zahra, S.R.; Chishti, M.A. Ransomware and internet of things: A new security nightmare. In Proceedings of the 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 10–11 January 2019; pp. 551–555. [Google Scholar]
Molina, R.M.A.; Bou-Harb, E.; Torabi, S.; Assi, C. RPM: Ransomware Prevention and Mitigation Using Operating Systems’ Sensing Tactics. In Proceedings of the IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; pp. 1–6. [Google Scholar]
Alelyani, S.; Kumar, G.R.H. Overview of cyberattack on saudi organizations. J. Inf. Secur. Cybercrimes Res. 2018, 1, 32–39. [Google Scholar] [CrossRef]
Ransomware Trends Report. Available online: https://shorturl.at/AZtJ2 (accessed on 6 March 2025).
Mahamat, M.; Jaber, G.; Bouabdallah, A. Achieving efficient energy-aware security in IoT networks: A survey of recent solutions and research challenges. Wirel. Netw. 2023, 29, 787–808. [Google Scholar] [CrossRef]
Wei, L.; Rondon, L.P.; Moghadasi, A.; Sarwat, A.I. Review of cyber-physical attacks and counter defense mechanisms for advanced metering infrastructure in smart grid. Proceedings of IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Chicago, IL, USA, 16 April 2018; pp. 1–9. [Google Scholar]
Dzansi, D.Y.; Rambe, P.; Mathe, L. Cable Theft and vandalism by employees of South Africa’s electricity utility companies: A theoretical explanation and research agenda. J. Soc. Sci. 2014, 39, 179–190. [Google Scholar] [CrossRef]
Degambur, L.-N. Replay Attack Prevention in Decentralised Contact Tracing: A Blockchain-Based Approach. Open Access Libr. J. 2024, 11, 1–17. [Google Scholar] [CrossRef]
Gargoum, S. Data-Driven Detection and Verification of Replay Attacks on Industrial Control Systems. Ph.D. Dissertation, University of British Columbia, Vancouver, BC, Canada, 2023. [Google Scholar]
Rekha, S.; Thirupathi, L.; Renikunta, S.; Gangula, R. Study of security issues and solutions in Internet of Things (IoT). Mater. Today Proc. 2023, 80, 3554–3559. [Google Scholar] [CrossRef]
Doukas, C.; Maglogiannis, I.; Koufi, V.; Malamateniou, F.; Vassilacopoulos, G. Enabling data protection through PKI encryption in IoT m-Health devices. In Proceedings of the 12th International Conference on Bioinformatics & Bioengineering (BIBE), Larnaca, Cyprus Cyprus, 11–13 November 2012; pp. 25–29. [Google Scholar]
Shahzad, K.; Zia, T.; Qazi, E.-U. A review of functional encryption in IoT applications. Sensors 2022, 22, 7567. [Google Scholar] [CrossRef]
Chandu, Y.; Kumar, K.R.; Prabhukhanolkar, N.V.; Anish, A.N.; Rawal, S. Design and implementation of hybrid encryption for security of IOT data. In Proceedings of the International Conference on Smart Technologies for Smart Nation (SmartTechCon), Singapore, 17–19 August 2017; pp. 1228–1231. [Google Scholar]
Dorri, A.; Kanhere, S.S.; Jurdak, R.; Gauravaram, P. Blockchain for IoT security and privacy: The case study of a smart home. In Proceedings of the International Conference on Pervasive Computing and Communications Workshops (PerCom workshops), Washington DC, USA, 13–17 March 2017; pp. 618–623. [Google Scholar]
Wang, Q.; Zhu, X.; Ni, Y.; Gu, L.; Zhu, H. Blockchain for the IoT and industrial IoT: A review. Internet Things 2020, 10, 100081. [Google Scholar] [CrossRef]
Minoli, D.; Occhiogrosso, B. Blockchain mechanisms for IoT security. Internet Things 2018, 1, 1–13. [Google Scholar] [CrossRef]
Bandara, E.; Tosh, D.; Foytik, P.; Shetty, S.; Ranasinghe, N.; De Zoysa, K. Tikiri—Towards a lightweight blockchain for IoT. Future Gener. Comput. Syst. 2021, 119, 154–165. [Google Scholar] [CrossRef]
Naha, A.; Teixeira, A.M.H.; Ahlén, A.; Dey, S. Sequential detection of replay attacks. IEEE Trans. Autom. Control 2022, 68, 1941–1948. [Google Scholar] [CrossRef]
Google Safety Centre. Background Check, A Behind-the-Scenes Look at How Google is Making the Internet More Secure. Available online: https://safety.google/stories/backend/. (accessed on 6 March 2025).
Security-Reference-Architecture Amazon. Available online: https://docs.aws.amazon.com/prescriptive-guidance/latest/security-reference-architecture/welcome.html (accessed on 6 March 2025).
Xiao, L.; Wan, X.; Lu, X.; Zhang, Y.; Wu, D. IoT security techniques based on machine learning: How do IoT devices use AI to enhance security? IEEE Signal Process. Mag. 2018, 35, 41–49. [Google Scholar] [CrossRef]
Aldahiri, A.; Alrashed, B.; Hussain, W. Trends in using IoT with machine learning in health prediction system. Forecasting 2021, 3, 181–206. [Google Scholar] [CrossRef]
Bhayo, J.; Shah, S.A.; Hameed, S.; Ahmed, A.; Nasir, J.; Draheim, D. Towards a machine learning-based framework for DDOS attack detection in software-defined IoT (SD-IoT) networks. Eng. Appl. Artif. Intell. 2023, 123, 106432. [Google Scholar] [CrossRef]
Islam, N.; Farhin, F.; Sultana, I.; Kaiser, M.S.; Rahman, S.; Mahmud, M.; Hosen, A.S.M.S.; Cho, G.H. Towards Machine Learning Based Intrusion Detection in IoT Networks. Comput. Mater. Contin. 2021, 69, 1801–1821. [Google Scholar] [CrossRef]
Parmisano, A.; Garcia, S.; Erquiaga, M.J. A Labeled Dataset with Malicious and Benign IoT Network Traffic; Stratosphere Laboratory: Praha, Czech Republic, 2020. [Google Scholar]
Alfares, H.; Banimelhem, O. Comparative Analysis of Machine Learning Techniques for Handling Imbalance in IoT-23 Dataset for Intrusion Detection Systems. In Proceedings of the 11th International Conference on Internet of Things: Systems, Management and Security (IOTSMS), Malmö, Sweden, 2–5 September 2024; pp. 112–119. [Google Scholar]
Abdalgawad, N.; Sajun, A.; Kaddoura, Y.; Zualkernan, I.A.; Aloul, F. Generative deep learning to detect cyberattacks for the IoT-23 dataset. IEEE Access 2021, 10, 6430–6441. [Google Scholar] [CrossRef]
Nanthiya, D.; Keerthika, P.; Gopal, S.B.; Kayalvizhi, S.B.; Raja, T.; Priya, R.S. SVM based DDoS attack detection in IoT using Iot-23 botnet dataset. In Proceedings of the 2021 Innovations in Power and Advanced Computing Technologies (i-PACT), Kuala Lumpur, Malaysia, 27–29 November 2021; pp. 1–7. [Google Scholar]
Jeelani, F.; Rai, D.S.; Maithani, A.; Gupta, S. The detection of IoT botnet using machine learning on IoT-23 dataset. Proceedings of the 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM) Volume 2, 634–639.
Dutta, V.; Choraś, M.; Pawlicki, M.; Kozik, R. Detection of Cyberattacks Traces in IoT Data. JUCS J. Univers. Comput. Sci. 2020, 26, 1422–1434. [Google Scholar] [CrossRef]
Ahli, A.; Raza, A.; Akpinar, K.O.; Akpinar, M. Binary and Multi-Class Classification on the IoT-23 Dataset. In Proceedings of the 2023 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 20–23 February 2023; pp. 1–7. [Google Scholar]
Wongvorachan, T.; He, S.; Bulut, O. A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
Zeeshan, M.; Riaz, Q.; Bilal, M.A.; Shahzad, M.K.; Jabeen, H.; Haider, S.A.; Rahim, A. Protocol-Based deep intrusion detection for DoS and DDoS attacks using UNSW-NB15 and Bot-IoT data-sets. IEEE Access 2021, 10, 2269–2283. [Google Scholar] [CrossRef]
Brandt, J.; Lanzén, E. A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification, Dissertation, Uppsala University. 2021. Available online: https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-432162, (accessed on 24 April 2025).
Pradipta, G.A.; Wardoyo, R.; Musdholifah, A.; Sanjaya, I.N.H.; Ismail, M. SMOTE for Handling Imbalanced Data Problem: A Review. In Proceedings of the 2021 Sixth International Conference on Informatics and Computing (ICIC), Virtual, 3–4 November 2021; pp. 1–8. [Google Scholar]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
EL Hariri, A.; Mouiti, M.; Habibi, O.; Lazaar, M. Improving Deep Learning Performance Using Sampling Techniques for IoT Imbalanced Data. Procedia Comput. Sci. 2023, 224, 180–187. [Google Scholar] [CrossRef]
Stoian, N. No title. Machine Learning for Anomaly Detection in IoT Networks: Malware Analysis on the IoT-23 Data Set. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
Sarhan, M.; Layeghy, S.; Moustafa, N.; Gallagher, M.; Portmann, M. Feature extraction for machine learning-based intrusion detection in IoT networks. Digit. Commun. Netw. 2024, 10, 205–216. [Google Scholar] [CrossRef]
Sarhan, M.; Layeghy, S.; Portmann, M. Feature analysis for machine learning-based IoT intrusion detection. arXiv 2021, arXiv:2108.12732. [Google Scholar]
Moustafa, N.; Ahmed, M.; Ahmed, S. Data analytics-enabled intrusion detection: Evaluations of ToN_IoT linux datasets. In Proceedings of the 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 727–735. [Google Scholar]
Elsayed, R.A.; Hamada, R.A.; Abdalla, M.I.; Elsaid, S.A. Securing IoT and SDN systems using deep-learning based automatic intrusion detection. Ain Shams Eng. J. 2023, 14, 102211. [Google Scholar] [CrossRef]
Guo, G. A machine learning framework for intrusion detection system in iot networks using an ensemble feature selection method. In Proceedings of the IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 27–30 October 2021; p. 593. [Google Scholar]
Guo, G. An intrusion detection system for the internet of things using machine learning models. In Proceedings of the 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Nanchang, China, 15–17 July 2022; pp. 332–335. [Google Scholar]
Guo, G.; Pan, X.; Liu, H.; Li, F.; Pei, L.; Hu, K. An IoT intrusion detection system based on TON IoT network dataset. Proceedings of 13th Annual Computing and Communication Workshop and Conference (CCWC), Virtual, 8 March 2023; p. 333. [Google Scholar]
Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion detection system using machine learning for vehicular Ad Hoc networks based on ToN-IoT dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
Cao, Z.; Zhao, Z.; Shang, W.; Ai, S.; Shen, S. Using the ToN-IoT dataset to develop a new intrusion detection system for industrial IoT devices. Multimedia Tools Appl. 2024, 83, 1–29. [Google Scholar] [CrossRef]
Gad, A.R.; Haggag, M.; Nashat, A.A.; Barakat, T.M. A distributed intrusion detection system using machine learning for IoT based on ToN-IoT dataset. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 130667. [Google Scholar] [CrossRef]
Shtayat, M.M.; Hasan, M.K.; Sulaiman, R.; Islam, S.; Khan, A.U.R. An explainable ensemble deep learning approach for intrusion detection in industrial internet of things. IEEE Access 2023, 11, 115047–115061. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E. A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework. Future Gener. Comput. Syst. 2020, 110, 91–106. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Slay, J. Towards developing network forensic mechanism for botnet activities in the IoT based on machine learning techniques. Mob. Netw. Manag. 2019, 100, 30–44. [Google Scholar]
Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. CorrAUC: A malicious Bot-IoT traffic detection method in IoT network using machine-learning techniques. IEEE Internet Things J. 2020, 8, 3242–3254. [Google Scholar] [CrossRef]
Pokhrel, S.; Abbas, R.; Aryal, B. IoT security: Botnet detection in IoT using machine learning. arXiv 2021, arXiv:2104.02231. [Google Scholar]
Shafiq, M.; Tian, Z.; Sun, Y.; Du, X.; Guizani, M. Selection of effective machine learning algorithm and Bot-IoT attacks traffic identification for internet of things in smart city. Future Gener. Comput. Syst. 2020, 107, 433–442. [Google Scholar] [CrossRef]
Leevy, J.L.; Hancock, J.; Khoshgoftaar, T.M.; Peterson, J. Detecting information theft attacks in the bot-iot dataset. In Proceedings of the 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 807–812. [Google Scholar]
Pavaiyarkarasi, R.; Manimegalai, T.; Satheeshkumar, S.; Dhivya, K.; Ramkumar, G. A Productive Feature Selection Criterion for Bot-IoT Recognition based on Random Forest Algorithm. In Proceedings of the IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), Indore, India, 23–24 April 2022; pp. 539–545. [Google Scholar]
Jayalaxmi, P.; Kumar, G.; Saha, R.; Conti, M.; Kim, T.-H.; Thomas, R. DeBot: A deep learning-based model for bot detection in industrial internet-of-things. Comput. Electr. Eng. 2022, 102, 108214. [Google Scholar] [CrossRef]
Atuhurra, J.; Hara, T.; Zhang, Y.; Sasabe, M.; Kasahara, S. Dealing with Imbalanced Classes in Bot-IoT Dataset. arXiv 2024, arXiv:2403.18989. [Google Scholar]
Ibrahimi, K.; Benaddi, H. Improving the ids for bot-iot dataset-based machine learning classifiers. In Proceedings of the 5th International Conference on Advanced Communication Technologies and Networking (CommNet), Marrakech, Morocco, 12–14 December 2022; pp. 1–6. [Google Scholar]
Leevy, J.L.; Hancock, J.; Khoshgoftaar, T.M.; Peterson, J.M. An easy-to-classify approach for the bot-IoT dataset. In Proceedings of the 2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI), Atlanta, GA, USA, 13–15 December 2021; pp. 172–179. [Google Scholar]
Ullah, I.; Mahmoud, Q.H. A scheme for generating a dataset for anomalous activity detection in iot networks. In Proceedings of the Canadian Conference on Artificial Intelligence, Ottawa, ON, Canada, 6 May 2020; pp. 508–520. [Google Scholar]
Al-Akhras, M.; Alshunaybir, A.; Omar, H.; Alhazmi, S. Botnet attacks detection in IoT environment using machine learning techniques. Int. J. Data Netw. Sci. 2023, 7, 1683–1706. [Google Scholar] [CrossRef]
Alsaaidah, A.; Almomani, O.; Abu-Shareha, A.A.; Abualhaj, M.M.; Achuthan, A. ARP Spoofing Attack Detection Model in IoT Network using Machine Learning: Complexity vs. Accuracy. J. Appl. Data Sci. 2024, 5, 1850–1860. [Google Scholar] [CrossRef]
Elmahfoud, E.; Elhajla, S.; Maleh, Y.; Mounir, S. Machine Learning Algorithms for Intrusion Detection in IoT Prediction and Performance Analysis. Procedia Comput. Sci. 2024, 236, 460–467. [Google Scholar] [CrossRef]
Maniriho, P.; Niyigaba, E.; Bizimana, Z.; Twiringiyimana, V.; Mahoro, L.J.; Ahmad, T. Anomaly-based intrusion detection approach for IoT networks using machine learning. In Proceedings of the 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 17–18 November 2020; pp. 303–308. [Google Scholar]
Doghramachi, D.F.; Ameen, S.Y. Internet of Things (IoT) Security Enhancement Using XGboost Machine Learning Techniques. Comput. Mater. Contin. 2023, 77, 717–732. [Google Scholar] [CrossRef]
Verma, R.; Chandra, S. RepuTE: A soft voting ensemble learning framework for reputation-based attack detection in fog-IoT milieu. Eng. Appl. Artif. Intell. 2023, 118, 105670. [Google Scholar] [CrossRef]
Le, T.-T.; Shin, Y.; Kim, M.; Kim, H. Towards unbalanced multiclass intrusion detection with hybrid sampling methods and ensemble classification. Appl. Soft Comput. 2024, 157, 111517. [Google Scholar] [CrossRef]
Yan, Y.; Yang, Y.; Gu, Y.; Shen, F. A Few-Shot Intrusion Detection Model for the Internet of Things. In Proceedings of the 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 22–24 September 2023; pp. 531–537. [Google Scholar]
Iorli̇am, A. A Novel Additive Internet of Things (IoT) Features and Convolutional Neural Network for Classification and Source Identification of IoT Devices. Sak. Univ. J. Comput. Inf. Sci. 2023, 6, 218–225. [Google Scholar] [CrossRef]
Van Huong, P.; Minh, N.H. Improving the feature set in IoT intrusion detection problem based on FP-Growth Algorithm. In Proceedings of the 2020 International Conference on Advanced Technologies for Communications (ATC), Nha Trang, Vietnam, 8–10 October 2020; pp. 18–23. [Google Scholar]
Selvam, R.; Velliangiri, S. An Improving Intrusion Detection Model Based on Novel CNN Technique Using Recent CIC-IDS Datasets. In Proceedings of the 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), Bengaluru, India, 15–16 March 2024; pp. 1–6. [Google Scholar]
Elhanashi, A.; Gasmi, K.; Begni, A.; Dini, P.; Zheng, Q.; Saponara, S. Machine learning techniques for anomaly-based detection system on CSE-CIC-IDS2018 dataset. In Proceedings of the International Conference on Applications in Electronics Pervading Industry, Environment and Society, Genoa, Italy, 2 September 2022; pp. 131–140. [Google Scholar]
Hagar, A.A.; Gawali, B.W. Deep learning for improving attack detection system using CSE-CICIDS2018. NeuroQuantology 2022, 20, 3064. [Google Scholar]
Kurniabudi; Stiawan, D.; Darmawijoyo; Bin Idris, M.Y.; Bamhdi, A.M.; Budiarto, R. CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 2020, 8, 132911–132921. [Google Scholar] [CrossRef]
Maseer, Z.K.; Yusof, R.; Bahaman, N.; Mostafa, S.A.; Foozy, C.F.M. Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 2021, 9, 22351–22370. [Google Scholar] [CrossRef]
Pelletier, Z.; Abualkibash, M. Evaluating the CIC IDS-2017 dataset using machine learning methods and creating multiple predictive models in the statistical computing language R. Science 2020, 5, 187–191. [Google Scholar]
Peterson, J.M.; Leevy, J.L.; Khoshgoftaar, T.M. A review and analysis of the bot-iot dataset. In Proceedings of the 2021 IEEE International Conference on Service-Oriented System Engineering (SOSE), Oxford, UK, 23–26 August 2021; pp. 20–27. [Google Scholar]
FortiGuard AI-Powered Security Services. Available online: https://www.fortinet.com/solutions/enterprise-midsize-business/security-as-a-service/fortiguard-subscriptions (accessed on 6 March 2025).
Ullah, I.; Mahmoud, Q.H. Design and development of a deep learning-based model for anomaly detection in IoT networks. IEEE Access 2021, 9, 103906–103926. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, H.; Wang, P.; Sun, Z. RTIDS: A robust transformer-based approach for intrusion detection system. IEEE Access 2022, 10, 64375–64387. [Google Scholar] [CrossRef]
Sana, L.; Nazir, M.M.; Yang, J.; Hussain, L.; Chen, Y.-L.; Ku, C.S.; Alatiyyah, M.; Alateyah, S.A.; Por, L.Y. Securing the IoT Cyber Environment: Enhancing Intrusion Anomaly Detection With Vision Transformers. IEEE Access 2024, 12, 82443–82468. [Google Scholar] [CrossRef]
Benaddi, H.; Ibrahimi, K.; Benslimane, A.; Qadir, J. A deep reinforcement learning based intrusion detection system (drl-ids) for securing wireless sensor networks and internet of things. In Proceedings of the 12th EAI International Conference, WiCON 2019, TaiChung, Taiwan, 26–27 November 2019; pp. 73–87. [Google Scholar]
Tharewal, S.; Ashfaque, M.W.; Banu, S.S.; Uma, P.; Hassen, S.M.; Shabaz, M. Intrusion detection system for industrial internet of things based on deep reinforcement learning. Wirel. Commun. Mob. Comput. 2022, 2022, 1–8. [Google Scholar] [CrossRef]
Ren, K.; Zeng, Y.; Cao, Z.; Zhang, Y. ID-RDRL: A deep reinforcement learning-based feature selection intrusion detection model. Sci. Rep. 2022, 12, 1–18. [Google Scholar] [CrossRef]
Otoum, S.; Kantarci, B.; Mouftah, H. Empowering reinforcement learning on big sensed data for intrusion detection. In Proceedings of the IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar]
Al-Fawa’reh, M.; Abu-Khalaf, J.; Szewczyk, P.; Kang, J.J. MalBoT-DRL: Malware botnet detection using deep reinforcement learning in IoT networks. IEEE Internet Things J. 2023, 11, 9610–9629. [Google Scholar] [CrossRef]
Ullah, S.; Ahmad, J.; Khan, M.A.; Alshehri, M.S.; Boulila, W.; Koubaa, A.; Jan, S.U.; Ch, M.M.I. TNN-IDS: Transformer neural network-based intrusion detection system for MQTT-enabled IoT Networks. Comput. Netw. 2023, 237, 110072. [Google Scholar] [CrossRef]
Akuthota, U.C.; Bhargava, L. Transformer-Based Intrusion Detection for IoT Networks. IEEE Internet Things J. 2025, 12, 6062–6067. [Google Scholar] [CrossRef]
Khanday, S.A.; Fatima, H.; Rakesh, N. Towards the Development of an Ensemble Intrusion Detection Model for DDoS and Botnet Mitigation using the IoT-23 Dataset. J. Harbin Eng. Univ. 2023, 44, 562–578. [Google Scholar]
Vitorino, J.; Andrade, R.; Praça, I.; Sousa, O.; Maia, E. A comparative analysis of machine learning techniques for IoT intrusion detection. In Proceedings of the InInternational Symposium on Foundations and Practice of Security, Paris, France, 7 December 2021; pp. 191–207. [Google Scholar]
Zhang, C.; Jia, D.; Wang, L.; Wang, W.; Liu, F.; Yang, A. Comparative research on network intrusion detection methods based on machine learning. Comput. Secur. 2022, 121, 102861. [Google Scholar] [CrossRef]
de Souza, C.A.; Westphall, C.B.; Machado, R.B. Two-step ensemble approach for intrusion detection and identification in IoT and fog computing environments. Comput. Electr. Eng. 2022, 98, 107694. [Google Scholar] [CrossRef]
Kabir, M.H.; Rajib, M.S.; Rahman, A.S.M.T.; Rahman, M.M.; Dey, S.K. Network intrusion detection using unsw-nb15 dataset: Stacking machine learning based approach. In Proceedings of the 2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), Gazipur, Bangladesh, 24–26 February 2022; pp. 1–6. [Google Scholar]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M. Real time security assessment of the power system using a hybrid support vector machine and multilayer perceptron neural network algorithms. Sustainability 2019, 11, 3586. [Google Scholar] [CrossRef]
Wang, M.; Yang, N.; Weng, N. Securing a smart home with a transformer-based IoT intrusion detection system. Electronics 2023, 12, 2100. [Google Scholar] [CrossRef]

Figure 1. Forecast of the number of IoT-connected devices globally over the next decade [5].

Figure 2. Forecast of IoT security market size over the next 5 years [24].

Figure 3. IoT Architecture variants [34,35,36].

Figure 4. DoS attack description [47,48].

Figure 5. MiTM attack description [49].

Figure 6. Ransomware attack description [53].

Table 1. Comparison of some popular traditional security methods that are deployed across different layers/domains of typical IoT networks [2,4,13,15,17].

IoT Layer/Domain	Security Measures	Strengths	Limitations
Node devices/hardware devices	Tamper seals	Offer visible and digital evidence of intrusions. Seals cannot be restored once unsealed. Trust and deterrence on physical devices.	Easily triggered by environmental factors. Expensive to maintain.
	Firmware updates	Easily fix exploitable bugs. Good performances (both functional and non-functional elements).	Compatibility issues. Easily exploitable trigger updates.
	Secure element	Tamper resistance to physical attacks and injections. Storage opportunity for cryptographic keys and certificates.	Implementation cost and complexity of integration.
Data communication/Network protocol	Encryption, authentication, and access control models	Scalable, cheap, and efficient. Provide security against a wide range of attacks.	Cumbersome and computationally expensive.
Data communication/Network protocol	Blockchain	Decentralized and transparent security solution. Immutability applies.	Cost and energy inefficient, scalability concerns. Application complexities.
Application layer	Secure bootstrapping techniques	High effective solution.	Expensive. Scalability issues.

Table 2. Comparison of recently generated benchmark datasets used in literature studies [78,93,95,104,115].

Dataset	Year Published	Devices Used	Attack Type	Attack Labelled (Y/N)	Data File Format	Key Limitations
BoT-IoT	2019	IoT home devices, including R-Pi, smart lights, plugs, thermostats, IP cameras, etc.	DDoS, DoS, reconnaissance (scanning and probing), theft (info theft and data exfiltration)	Y	PCAP and CSV	Imbalanced data, simulation-based environment, limited network size and data size
UNSW-NB15	2015	Network infrastructure devices, including routers, switches, etc., servers and workstation machines, servers and firewalls, attack simulation tools including Nmap, Wireshack, etc.	Backdoor, DoS, exploits, fuzzers, port scans, recon, shell code, spam, worms	Y	CSV, PCAP, ARFF	Simulated attack scenarios, imbalanced data
CIC-IDS 2017	2017	Network infrastructure devices (routers, switches), servers and workstations, virtual and physical machines, IoT end devices, and attack simulation tools	Botnet, XSS, DoS, DDoS, heart bleed, infiltration attack, SSH, brute force, SQLi	Y	PCAP, CSV	Imbalanced data, outdated attack variants, and numerous redundant data features
CIC-IDS 2018	2018	Network infrastructure devices, including routers, switches, etc., servers and workstation machines, servers and firewalls, IoT devices (R-Pi, cameras, etc.), attack simulation tools, including Nmap, Wireshack, etc.	Botnet, brute force, port scan, DDoS, DoS, web attack, and infiltration attack	Y	CSV	Imbalanced dataset, limited and controlled ecosystem deployed to generate the dataset, several redundant features
IoTID20	2020	IoT end devices (smart plugs, cameras, lights, speakers, etc.), attack simulation tools including Wireshack, hydra, medusa, etc.	Syn flooding, brute force, HTTP flooding, UDP flooding, ARP spoofing, host port, and OS Scan	Y	CSV	Imbalanced data, limited attack varieties. The ecosystem used for generating the data is limited.
MQTT-IoT-IDS 2020	2020	IoT end devices (lights, thermostats, plugs, speakers, etc.), MQTT comm protocol, attack simulation tools including Wireshack, ettercap, hydra, medusa, etc.	DoS, DDoS, Brute force, MiTM, credential stuffing	Y	CSV	Class imbalance issue, limited attack varieties. Also, the dataset is captured in a static environment.
TON_IoT	2020	IoT end devices (weather stations, smart fridges, doors, lights, etc.), SCADA, PLC, servers (web servers, file transfer servers, database servers)	Ransomware, password attack, scanning, DoS, DDoS, data injection, backdoor, cross-site scripting (XSS), and MiTM	Y	CSV and PCAP	Class imbalance issue, the dataset is made up of static logs. Huge redundant features.
IoT-23	2020	Smart home devices (Amazon Echo Home device, smart LED lamp, and a door lock).	DDoS, Brute Force and credential stuffing, malware infections (Mirai, Gafgyt), C2 communications, scanning and exploitation	Y	PCAP, CSV, and TXT	Imbalanced dataset. The ecosystem network deployed to generate the dataset is limited and lacks diversity.

Table 3. Comparison of Recently Proposed Data-driven AI-Based Approaches for IoT Network Security Assessment.

Ref.	AI Model(s)	Dataset(s)	Contributions in Summary
[90]	LSTM, CNN, and CNN-LSTM	IoTID20	Implemented SMOTE, ADASYN, and RUS to deal with the class imbalance issue. Metrics including accuracy, precision, recall, AUC, F1-score, and sensitivity were used to evaluate model performances. The results achieved were outstanding.
[150]	Transformer	TON_IoT	Focus on analysis of traffic data as well as IoT sensors’ telemetry data. The experimentation explored both binary classification and multiple classifications. Missing data points and invalid values of categorical features in the dataset are substituted with the string “-” while all invalid values of numerical features were substituted with the value “−1”. Each categorical feature with a value between 0 and U is encoded, whereby U is the number of unique contents for the categorical feature. The values are further used to calculate embedding vectors for categorical features, and numerical features are normalized. The model performed exceptionally well.
[122]	Ensemble AdaBoost, LightGBM, and XGBoost	IoTID20 and Car Hacking: Attack and Defense Challenge 2020 (CHADC2020)	Rigorous pre-processing, including dealing with missing values median imputation, correlation-based feature selection, and min/max normalization. Tomek links and edited nearest neighbors (ENN) for undersampling while SMOTE and BorderlineSMOTE for oversampling. Using metrics including precision, recall, F1-score, accuracy, and ROC-AUC, the hybrid-ensemble approach achieves remarkable results.
[1]	Neural networks	Edge-IIoTset, WUSTL-IIOT-2021, and IoTID20	Optimization based on the Salp swarm algorithm (SSA) for selecting optimal networks for the neural network model. The combination performed well on the three datasets.
[117]	RF	IoTID20	Pre-processing steps include removing duplicate data and irrelevant features such as Flow_ID, Src_IP, Timestamp, and Dst_IP. Also, replacing missing values using median values and using min/max normalization. Focusing on the ARP Spoofing attack detection. Comparison between simple and ensemble models and evaluating the performance using accuracy, precision, sensitivity, F-measure, and speed. The experimental results showed that wrapper feature selection with the RF classifier reduced around 50.6% of the IoTID20 features and achieved good performances.
[116]	Neural networks, RF, DT	IoTID20, N-BaIoT, and MedBIoT	Feature reduction and noise filtering based on RENN, Explore, and DROP5 algorithms. SMOTE for dealing with data imbalances. Standard metrics, including accuracy, precision, recall, specificity, F-score, and G-mean were used. Results showed that the RENN and DROP5 filtering models presented excellent results.
[86]	LSTM	UNSW-NB15 and Bot-IoT	Protocol-Based Deep Intrusion Detection (PB-DID) architecture, whereby a dataset of packets from IoT traffic involving features from the UNSW-NB15 and Bot-IoT datasets based on flow and TCP, was analyzed. Non-anomalous, DoS, and DDoS traffic were classified. Typical issues such as class imbalance and over-fitting issues were catered for. Exceptional performances were achieved based on the results.
[100]	KNN, RF, CART, and deep learning algorithms (CNN, LSTM, DNN)	TON_IoT	For the issue of class imbalance, SMOTE was deployed for oversampling, while a modified version of the redundant-based Tomek link removal technique was used to cater for under-sampling. Deployed the use of sine and cosine component cyclic encoding for temporal attributes. The learners performed well with the binary and multi-class classification based on the evaluation metrics which include accuracy, precision, recall, and F1 score.
[109]	Four ensembles (CatBoost, Light-GBM, XGBoost, and RF and four non-ensembles (DT, Logistic Regression, Naive Bayes, and a Multilayer Perceptron).	BoT-IoT	The subset of original data containing 3,668,522 instances and 43 features was used for experimentation. The focus was on the 477 normal instances and 79 information theft instances. To gauge the classifiers’ performances, AUC and AUPRC metrics were used.
[114]	DT	BoT-IoT	For easy learning, the subset of the original data was used (only 3 out of the 29 Bot-IoT features). For evaluating the classification performance of the algorithm, AUC and AUPRC metrics were used and the model performed well.
[107]	KNN, Naive Bayes and Multi-layer Perception	BoT-ToT	SMOTE was used for data feature engineering. Based on the accuracy, precision, recall, F1-Score, and ROC AUC results, the KNN algorithm performed the best among the learners.
[91]	RF, Naıve Bayes, Neural Networks, SVM and AdaBoost.	IoT-23	A statistical correlation was deployed whereby a correlation matrix for each of the 33 data files in the IoT-23 data was drawn and the results were averaged. Data with no statistical correlation were removed. RF presented the best result.
[80]	Adversarial Autoencoders (AAE) and Bidirectional Generative Adversarial Networks (BiGAN)	IoT-23	Preprocessing involves various tasks like feature selection, encoding, normalization, and balancing. The generative models outperform traditional machine learning models.
[99]	8 classifiers, including logistic regression, naive Bayes, DT, SVM, KNN, RF, Adaboost, and XGBoost.	TON_IoT	Missing values are imputed and one-hot encoding was used to convert categorical features. Min/max normalization also formed part of the pre-processing steps. The Chi² technique was used for feature selection and SMOTE was deployed for class balancing. XGBoost outperformed the other models.
[146]	DT, Naive Bayes, SVM, RF, CNN, RNN, XGBoost	KDD CUP 99 and NSL-KDD	PCA was used for dimensionality reduction. Metrics, including detection accuracy, F1, and AUC, were used to perform a comparative assessment of the learners. Analysis results showed that the ensemble learning algorithm performed generally better. The Naive Bayes algorithm has low accuracy in recognizing the learned data, but it has obvious advantages when facing new types of attacks, and the training speed is faster.
[145]	SVM), XGBoost, LightGBM, Isolation Forest (iForest), Local Outlier Factor (LOF), and Double Deep Q-Network (DDQN)	IoT-23	A comparative analysis of supervised, unsupervised, and reinforcement learning techniques on nine malware captures of the IoT-23 dataset, whereby both binary and multi-class classification scenarios were considered. The most reliable result was achieved by the LightGBM model. The DDQN also performed well in specific tasks.
[4]	Gated recurrent unit, LSTM, and multilayer perceptron	IoT-23, IoTID-20	An ensemble model was proposed. PCA was used for feature selection. The ensemble model’s performance was evaluated using IoT-23 and IoTID-20 datasets.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alimi, O.A. Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions. Technologies 2025, 13, 176. https://doi.org/10.3390/technologies13050176

AMA Style

Alimi OA. Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions. Technologies. 2025; 13(5):176. https://doi.org/10.3390/technologies13050176

Chicago/Turabian Style

Alimi, Oyeniyi Akeem. 2025. "Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions" Technologies 13, no. 5: 176. https://doi.org/10.3390/technologies13050176

APA Style

Alimi, O. A. (2025). Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions. Technologies, 13(5), 176. https://doi.org/10.3390/technologies13050176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Learning Models for Internet of Things Security: Emerging Trends, Applications, Challenges and Future Directions

Abstract

1. Introduction

2. Overview of IoT Network Security

2.1. IoT Network Architectural Set-Up

2.2. IoT Network Popular Attacks

3. IoT Network Security and Privacy Solutions

4. Performance Evaluation and Comparative Analysis of Data-Driven AI Models for IoT Network Security and Privacy Menaces

4.1. IoT Network Data Generation and Pre-Processing Phase

4.2. IoT Network Data Analysis/ Classification Phase

5. Research Gaps and Future Directions

6. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI