1. Introduction
The evolution of the IoT network infrastructure has influenced the increasing number of embedded devices and intelligent applications. The IoT objective is to build intelligent environments capable of improving human life quality, comfort, and competitiveness. Devices in smart architectures communicate with one another to execute different tasks. IoT-enabled systems have been used in manufacturing settings as well as for a variety of commercial uses. These intelligent systems include a broad spectrum of capabilities, from smart houses to smart cities, intelligent buildings, and other intelligent utilities such as factory automation and management, power generation networks, and transportation [
1]. The IoT raises several challenges, including privacy and security. The security challenges associated with the IoT will grow as it develops and progresses over the next several years [
2], which further raises the possibility of cyberattacks. Cybercriminals attacked IoT devices in 2020, according to recent reports, indicating a significant rise in IoT vulnerabilities on wireless networks [
3]. There would be more substantial motivations and desire for attackers to discover innovative and creative ways to hack IoT applications due to the increased rewards for effective IoT breaches.
Traditional approaches and strategies used in the conventional Internet to safeguard against cyberattacks prove ineffective in defending against the specific weaknesses discovered on the Internet. Security issues in a network are managed by three general methods: prevention, detection, and mitigation. All three steps will have to be taken to ensure effective mitigation strategies for IoT networks. Cybersecurity is an essential component of the information technology system of today’s IoT world. Although the IoT improves performance and competitiveness by smart control, it also raises cyberattacks. The IoT privacy security paradigm is important in today’s new technologies. The increased diversity of IoT systems in the market shows that the industry is making strides toward revolutionizing IoT architecture. As a result, the specifications governing IoT system connectivity are complex, requiring the development of a unified system to facilitate device communication. The growing range of IoT devices designed for different applications means that IoT manufacturers constantly develop IoT technology and reduce their time to market their products. Customers have benefited from these technologies on one side, while critical facilities have successfully incorporated IoT devices to implement their operations. Apart from requiring significant security enhancements, the IoT exposes users’ details to cyberattacks. Although IoT technologies have helped humanity in a variety of respects, they still have several flaws. Despite the fact that many security protocols have been implemented to defend IoT devices from cyber threats, security guidelines are not well established [
4]. More than 85% of companies worldwide would switch to IoT devices in one way or another, and 90% of these companies do not know about IoT device security [
5]. A new HP report has also found that 70% of Internet-connected devices are susceptible to multiple attacks [
1]. Additionally, active launches of cyberattacks such as Mirai [
6], Shamoon-2 [
7], and Ransom-like [
8] attacks on critical infrastructure indicate that current IoT security measures have been inefficient.
1.1. Motivation
Anomaly identification techniques have been the primary source of motivation for many researchers due to their capacity to identify new threats. Massive IoT devices are integrated into our daily lives in many ways. They are commonly employed in several sectors, such as healthcare, manufacturing, delivery, road traffic management, city life safety, shopping, sustainability, city protection, smart communities, transportation, waste management, smart street lighting, traffic signs, and vehicle networks [
9]. Designing a fully safe device is impossible because there is no such thing as total security; humans can make mistakes, most current networks contain security vulnerabilities, insider misuse is common, and all types of intrusions remain unknown. At present, attackers use sophisticated techniques to execute increasingly serious attacks efficiently with little technical knowledge of the network. Currently, a practical attack has a significant impact on IoT infrastructure, while the time taken to carry out these attacks continues to decrease. IoT systems have evolved into an attractive goal for attackers looking to launch destructive attacks, and the threat surface of IoT networks will continue to increase as a result of their rapid growth. The most difficult challenge, which is embedded in expanding the network’s stability, is cybersecurity.
Classification of malicious activity in big data is becoming more complex in IoT networks. An important factor for protecting IoT networks is an intrusion detection system (IDS). Anomaly-based and misused-based approaches in the field of intrusion detection research are generally focused and inspired detection methodologies. IDS is now a crucial part of protecting complex IoT networks. IDS can detect malicious activity or protocol failure in an operating system or network. IDS can be divided into a centralized intrusion detection system (CIDS) and a distributed intrusion detection system (DIDS). Data analysis may be done at a single location in CIDS, while DIDS consists of multiple IDSs at various locations where the data analysis is conducted. Security companies have had to develop IoT defense strategies alongside existing ones in response to the popularity of the Internet. In addition to using standard network-based intrusion detection techniques, several alternative tools are provided for those investigating intrusions or identifying network threats. Machine learning has shown to be both essential and useful in detecting cyberattacks in real-time. Various mechanisms can be used to generate knowledge from collected and analyzed data. Supervised learning uses the current anomalous data to define the inconsistency to a reference point. In contrast, unsupervised learning determines the anomalous activity by inferential learning to draw a decision based on identified evidence. Popular machine learning techniques are support vector machine, Markov-based, grammar-based, neural network, change detection, Bayesian, decision tree, nearest prototype, hierarchical, and outlier-based identification techniques for classification and clustering [
10].
1.2. Contribution
Machine learning advances have resulted in new solutions, and they can adapt to changes in the environment by continuous learning. Although machine learning is increasingly being applied to IoT intrusion detection, it has some shortcomings, which should be considered. Firstly, comparatively single types of intrusions are analyzed, and diverse types of attacks are not considered. Second, it is incredibly time-consuming to identify effective features for training the machine learning algorithm during the data processing stage. Extracting a huge number of features consumes many resources. As a result, a lightweight method for extracting the relevant and limited number of features for machine-learning-based detection of various IoT attacks is needed. To address these issues, first, we select the four most recent intrusion detection datasets’ pcap files and generate adapted datasets using CICFlowmeter [
11]. Next, we combine these datasets into large attack classes. Our proposed, adopted datasets have 80 network and flow features. We select the 48 best features using the recursive features elimination (RFE) approach. Although multiple techniques have been developed to detect anomalies, CNN and GRU networks for classifying attacks have received much fewer attempts. The model proposed in this paper is an extension of a model employing convolutional neural networks that we previously proposed [
12], in which we used three different models of convolutional neural networks. In this paper, we propose a deep learning model based on CNN and GRU for binary and multiclass class anomaly detection and classification in IoT networks. The proposed model uses an input layer, two convolutional layers, two GRU layers, a flatten layer, a dense layer, and an output layer. The proposed scheme classifies 15 different attacks using a convolutional and GRU-based neural network model in the multiclass classifier, essentially separating them from normal network traffic. A novel anomaly-based intrusion detection system for IoT networks using convolutional and gated recurrent unit neural networks is presented in this article. Our proposed binary classification model achieved an accuracy of 99.96%, while our proposed multiclass classification model achieved an accuracy of 99.92%.
Artificial intelligence has made tremendous improvements in closing the distance between human and computer capabilities. A CNN is a deep neural network that can take in an image as data, allocate significance to various aspects of the image, and distinguish one from the other. In comparison to other classification algorithms, a CNN needs significantly less preprocessing [
13]. The spatial and temporal properties can be successfully captured by the CNN using related filters in an image. A CNN consists of several artificial neuronal layers. Each neuron has its weights to determine its action. CNNs are fed with the image values and find different features in them. A CNN is typically made up of many convolution layers, but it may also include other elements. Convolution is the initial layer in which the characteristics of an image are extracted. Convolution retains a link between pixels via the use of tiny input squares to learn picture attributes. If the images are too large, the pooling layers section reduces the number of parameters. Spatial pooling, also known as subsampling or downsampling, is a technique for reducing the dimension of each map while retaining critical information. Max pooling, average pooling, and sum pooling are three types of spatial pooling. A fully connected layer flattens the matrix into a vector and feeds it into a fully connected layer, similar to a neural network. A classification layer is the final layer of a CNN, and it uses the output of the previous convolution layer as input. The classification layer generates a series of confidence values (score between 0 and 1) based on the activation function of the final convolution layer, which indicates how probable the object is to correspond to a “class” [
14]. Short-term memory is a limitation in recurrent neural networks (RNN). If a series is too long, it would have difficulty transporting knowledge from earlier to subsequent time steps. RNNs encounter the vanishing gradient problem during backpropagation [
15]. Gradients are the variables that are used to change the weights of a neural network. When a gradient value becomes quite minimal, it does not add significantly to learning. Therefore, layers obtaining a small gradient update struggle to learn in recurrent neural networks. As a result of the lack of learning in these layers, RNNs will overlook what they saw in longer sequences, resulting in short-term memory. GRUs have been developed to address the issue of short-term memory. The GRU is a more recent generation of RNN that is very close to the LSTM. The GRU is equipped with two gates: a reset and an update, while the LSTM has three gates [
16,
17]. The update gate operates similarly to an LSTM’s forget and input gates. The update gate assists the model in determining how much past data should be transmitted to the future. It determines what data to discard and what new data to use. The reset gate determines how much previous knowledge is to be forgotten. Since GRUs have fewer tensor operations, they are slightly faster to train than LSTMs [
15].
The rest of the paper proceeds as follows: in
Section 2, the related work is presented. The proposed model, data collection, and preprocessing dataset are discussed in
Section 3. The analysis of the results is presented in
Section 4. Finally,
Section 5 concludes the paper and offers ideas for future work.
2. Related Work
Internet infrastructure is increasingly evolving because of advances in computing technology. However, we have seen issues such as vulnerabilities because of these advances. Kim et al. [
18] use a convolutional neural network to classify malicious traffic using packet size and arrival time. They achieved an accuracy of 95%, which is considered very low for modern IoT networks. Internet traffic is increasing at an exponential rate, with daily data generation ranging from zettabytes to petabytes. Along with this increase in use, security risks to networks, the web, databases, and organizations are increasing. Hassan et al. [
19] suggest a hybrid deep learning model for effectively detecting network intrusions focused on the convolutional neural network and a long-term memory network. They used a deep convolutional neural network to extract important features from IDS big data and LSTM characteristics to keep longer-term correlations between derived features to avoid the overfit on recurring connections. The accuracy was measured at 97.10%, which is insufficient for today’s IoT networks. The advancement in information technology and economic progress have also accelerated the IoT industry. IoT networks are susceptible to attacks due to the limited infrastructure available to sensor nodes, the difficulty of networking, and the free wireless broadcast transmission characteristics. Li et al. [
20] suggest an algorithm for extracting IoT features and detecting intrusions in a smart city built on a deep migration learning paradigm and incorporate deep learning and intrusion detection technologies. Their proposed model has a faster detection time and a better detection performance than previous models. IoT technologies for smart cities have risen to prominence as a primary focus for threats such as botnets. Vinayakumar et al. [
21] suggest a botnet identification algorithm built on a two-tier deep learning architecture for semantically distinguishing botnets from valid activities at the domain name system implementation layer. Due to its potential architecture for analyzing the domain name system, the results are highly portable on heterogeneous computing servers.
Intrusion detection and prevention mechanisms continue to be the primary line of protection against severe threats. Kaur et al. [
22] suggest an image-based deep neural network model for classifying various attacks using two large datasets, CICIDS2017 and CSE-CICIDS2018. They also provide a selection of the best network flow features for detecting these attacks. Their convolutional neural network model produced poor performance for some attack categories. Most cybercriminals currently use encrypted communication networks to shield malicious activities and mimic legitimate user activity. These threats over a protected channel raise the vulnerability of interconnected networks to emerging threats and the risk of significant harm to many other end users. Ullah and Mahmoud [
23,
24] proposed a two-level intrusion detection system for IoT networks. The level-1 model categorizes network traffic as regular or irregular, while the level-2 model categorizes observed malicious behavior by category or subcategory. Their model precision, recall, and F1 score are 99.90% for levels 1 and 2. Yang and Lim [
25] present a novel deep-learning-based approach for detecting malicious SSL traffic. The suggested method extracts the unencrypted contents of the reconfigured record and produces a series of unencrypted data from successive SSL records for classification using deep learning. A long short-term memory encoder generates SSL sequences and uses them to build an encoded feature map for each flow.
These feature maps are forwarded to the convolutional network classifier to see if the SSL is abnormal or not. The massive number of IoT devices and their pervasive nature have drawn hackers looking to perform cyberattacks and data breaches. Ran et al. [
26] propose a framework for intrusion detection focused on bidirectional long short-term memory stacks (LSTM). They used the KDD99 dataset for model evaluation. Their model achieved 91.6% accuracy. The authors did not use an early stopping strategy, which may cause overfitting of the model. Ahmad and Alsemmeari [
27] proposed the extreme learning machine (ELM) approach to enhance intrusion detection. The authors examine, investigate, and apply well-known activation functions such as sine, sigmoid, and radial basis to quantify their success on the GA (genetic algorithm) features subset and the full features set. Their findings indicate that the radial base and sine functions work stronger on the GA feature set than the complete feature set. In contrast, the sigmoid function performs almost identically on both feature sets. GA-based function selection achieved 98% accuracy and improved the overall performance of the intrusion detection extreme learning machine.
Ling et al. [
28] developed a bidirectionally simple recurrent unit-based intrusion detection system. Their suggested approach is more precise and needs far less training time than alternative approaches. Kunang et al. [
29] used a pretraining strategy with a deep autoencoder and a deep neural network to build a deep learning intrusion detection framework. An automatic hyperparameter optimization method helps determine hyperparameter importance and the best categorical hyperparameter configuration to improve detection accuracy. Additionally, the performance outcomes exceed prior techniques in terms of multiclass evaluation criteria. A convolutional neural network model was used to develop an intrusion detection system that interprets network activity data as character sequences. The input matrices of the convolutional networks are united to form a complex matrix structure to perform image classification. Their model worked well on training data but performed poorly on testing data. To build an attack prevention mechanism and a secure network, Sicato et al. [
30] offer a comprehensive summary of emerging intrusion detection systems for IoT environments, address cyber-security risks, and evaluate and analyze transparent issues and concerns. They suggest a distributed cloud infrastructure built on software-defined IDS for securing the Internet of Things. Standard intrusion prevention techniques based on rules are insufficient to handle the highly dynamic network intrusion traffic. However, the potential of an intrusion detection system based on a traditional machine learning approach to generalize is still limited, and the false alarm rate is strong. Wu et al. [
31] suggest SRDLM, a modern intrusion detection approach focused on deep learning and semantic reencoding. Their suggested model approach reencodes the syntax of network activity, improves the traffic’s distinguishability, and strengthens the system’s generalizability through deep learning technology, significantly increasing the system’s efficiency and performance. They used the NSL-KDD data set, and the average score was improved by more than 8% compared to the standard machine learning approach. Sheu et al. [
32] use a reinforcement learning algorithm to design a system for identifying the opening time and load restrictions of electrical equipment. The suggested systems are based on a wireless communication network. They can monitor the energy consumption of home appliances, control smart appliances, and reduce the rate of fires caused by electrical appliance overload. Wireless sensor networks are vulnerable to hostile activity because of their security limits. Tariq [
33] has designed an anticipatory and proactive mechanism to predict host and grid anomalies. The proposed anomaly identification system’s architecture has been widely distributed to provide an accessible and adaptive technique for avoiding a single point of failure.
Table 1, adapted from [
12], provides an overview of the related literature that was evaluated. In
Table 1, DR represents detection rate, Acc means accuracy, Pr represents precision and F1 represent F1 score.
We studied deep neural network models from 2017 to 2021. Most of the models used the KDD99 dataset for evaluation. The KDD99 dataset is very old and was not created for use in IoT networks. As a result, the KDD99 dataset cannot be used to assess an intrusion detection framework for IoT networks. Many deep learning models for binary classification were created with accuracy as the only performance metric. Multiclass classification models show a very poor degree of accuracy. To perform intrusion detection, neither of the models merged CNN and GRU. This paper used CICFlowmeter [
11] to retrieve network features from four publicly available datasets’ pcap files. These datasets were developed using real and simulated IoT networks. We evaluate our proposed model using binary and multiclass classification. The proposed model was evaluated using accuracy, precision, recall, and F1 score as performance metrics. The proposed model achieved a high detection rate and a low false rate.
3. Proposed Model
Recently, convolutional neural networks have shown better performance in voice and picture recognition. Recurrent neural networks are frequently used in speech understanding, language synthesis, language modeling, and language generation. Both convolutional neural networks and recurrent neural networks generate interesting results in these areas, and their use is becoming more popular. Intrusion detection concerns can be more effectively repurposed into convolution neural network problems known as feature mapping. In this paper, we used a convolution neural network and a gated recurrent neural network. Our proposed multiclass CNNGRU model is described in
Figure 1a, while our proposed binary class CNNGRU model is shown in
Figure 1b. The multiclass model consists of an input layer, two convolutional layers, two GRU layers, a flatten layer, a fully connected dense layer, and an output layer. The binary class model consists of an input layer, one convolutional layer, one GRU layer, a flatten layer, a fully connected dense layer, and an output layer. The reshaping system sends information to the input layer. The convolution layer extracts feature characteristics from the input image and maintains the connection between pixels while also learning new image properties from small squares of input data. The batch normalization process seeks to equalize all of a neural network layer’s input. The batch normalization layer normalizes the performance of the convolution layer ahead of the average pooling layer. The pooling layer enables the enhancement of functionality by condensing them into sub-maps of robust features. The average pooling layer determines each patch’s total number of features by averaging the total number of features in each upgrade around the entire function map.
Overfitting may occur when neural networks have difficulty distinguishing between valid and invalid results; thus, further performance optimization of the test dataset parameters is often required. A dropout layer prevents overfitting by cutting out some training neurons in the process of model building. The tensor is restructured to provide a flat operation on a tensor with an element count equivalent to the element count of the tensor, except the batch size. A flat layer is entirely linked to a dense layer. The dense layer uses 512 neurons. The number of neurons in the output layer is equal to the number of classes used for classification. The CNNGRU model was trained, validated, and tested using six IoT intrusion detection datasets. Determining which features to use is a critical phase in machine learning. Model improvements known as feature selection require the identification and selection of those features required to increase prediction. The feature selection technique minimizes overfitting, accelerates model training, and strengthens the model’s resistance to test inaccuracies. In this article, we extract important features from our proposed datasets using a feature selection method known as recursive feature elimination [
72,
73]. The feature selection method estimates the overall significance of features using a random forest classifier. Tenfold cross-validation tests were performed to ensure that the feature selection model was not suffering from overfitting. The feature selection algorithm uses the IoT-DS-2 dataset and selects the 48 best features. The IoT-DS-2 dataset was used for feature selection since it contains attack data from all datasets.
We can measure convolution in 1D using temporal access and single-direction kernel movements. Convolution 1D uses two-dimensional input and output data, which is often seen in time series data. The input layer received an input vector (48, 1) that contains 48 best features. Two convolution layer blocks were used after the input layer. Each block consists of a convolution layer, activation layer, and dropout layer. Convolution layers collect input layer features and compute vector properties for small data samples within the input. The convolution first layer uses a relu activation function, 64 filters, and kernel size 8. The second convolution layer uses 128 filters and the relu activation function. Batch normalization ensures that the inputs are continuously normalized. The normalization process substantially reduces the difficulty associated with organizing changes across many levels. The average pooling layer downsamples feature maps by summarizing features. We used a dropout layer with a drop value of 0.1 to regularize the training data model and minimize overfitting. There are two GRU layer blocks. Each GRU block consists of the GRU layer, activation layer, and dropout layer. The two GRU layers use 512 units. The activation layer uses the relu activation function, and the dropout layers use a dropout value of 0.1. The flatten layer converts the tensor to a shape as the tensor components. Five hundred and twelve neurons are used in the dense layer, while the number of neurons in the output layer is equal to the number of classes in the dataset.
Many rounds of training are required to reach the convergence point for deep learning models. However, the number of training rounds can be minimized by selecting a particular parameter configuration that allows for further convolution in the training phase, generating and directing the network structure. Overfitting can be avoided using regularization. We tuned the multiclass model using kernel_regularizer, bias_regularizer, and activity_regularizer regularization approaches. The binary classification model was trained using the same hyperparameters. To help in feature learning over time, we assign random values to the CNN model layers at the beginning. To prevent overfitting, we use L1, L2, and dropout regularization. To modify optimizer weights, we use adam optimizers and sparse categorically cross-entropy loss functions.
A key machine learning algorithm variable is the learning rate, which affects the amount of change each model takes from one iteration to the next. We conducted many tests with various learning rates for the adam optimizer and determined that 0.0001 was the optimum learning rate for the greatest detection rate. Finally, we adopted an early stopping technique to avoid overfitting. The model monitors the validation loss and ends the training phase if it does not reduce after a specified number of cycles. To ensure the maximum possible network performance over the monitoring cycle, the epoch value must be changed before the network accuracy vs. epochs no longer improves. For the proposed model, we used 50, 100, 200, 500, and 1000 epochs. We chose 100 epochs as the optimum number of epochs since all trials using the model converged within this time frame. Activation functions are critical parameters for deep learning algorithms. Convolution, GRU, and dense layers use the relu activation function. Recurrent sigmoid activation in GRU and softmax activation is used in the output layer. The batch size is an essential hyperparameter in deep learning models. A larger batch size can reduce processing time and speed up the training process across multiple nodes. Bigger batch sizes provide similar training losses as smaller batch sizes, but larger batch sizes seem to generalize worse for testing results. For training and testing the proposed model, a batch size ranging from 64 to 128 was determined to be the most optimum option.
3.1. Data Collection
The initial phase includes the management of raw network traffic. This paper used four publicly available datasets’ pcap files and extracted network features using CICFlowmeter [
11]. The CICFlowmeter is open-source software that generates CSV files from pcap data. The CICFlowmeter generates 80 unique network features. The BoT-IoT dataset was developed by Koroniotis et al. [
74]. The testbed environments incorporated five IoT devices and were used to create a practical smart home architecture. These devices were operated locally and connected to cloud networks using the node-red scheme, which allowed for the development of normal network traffic. The new BoT-IoT data collection was shown in
Table 2. There are four attack types, which are further subdivided into ten subtypes. A comprehensive description of the testbed settings and attacks can be found in the described article [
74]. The newly developed botnet dataset has been made publicly available, and a link is included in [
75]. Kang et al. [
76] created a dataset for detecting IoT network intrusions. The IoT network intrusion dataset was created using a typical smart home device consisting of a smart home SKT NGU and an EZVIZ Wi-Fi camera. These two IoT devices are connected to a smart home Wi-Fi router and are used as victim devices. There are four attack categories and eight subcategories. The dataset for IoT network intrusions is shown in
Table 3. A connection to the newly developed IoT network intrusion dataset is included in [
77].
The MQTT-IoT-IDS2020 dataset was created by Hindy et al. [
78]. From the MQTT networking platform, this dataset consists of both regular network traffic and brute-force attacks. The network comprises 12 MQTT sensors, a broker, a device for replicating a camera stream, and an attacker. The dataset includes the most popular MQTT attacks and scenarios for analyzing real-world IoT devices. The MQTT-IoT-IDS2020 dataset is presented in
Table 4. The MQTT-IoT-IDS2020 dataset contains four attack categories. The new MQTT-IoT-IDS2020 dataset can be accessed at [
79]. The Stratosphere Laboratory of the CTU in the Czech Republic created the IoT-23 dataset [
80]. There are 20 malicious activities and 3 non-malicious activities. The IoT-23 dataset was created to provide researchers with a large and labeled dataset of real-world IoT devices and IoT malware infections to design a machine learning model. The IoT-23 dataset contains 20 separate network operation models to prototype various IoT device use cases. This dataset aims to provide the world with two distinct datasets: one that comprises benign and malicious network capture and another that only contains benign IoT network capture. The IoT-23 dataset can be seen in
Table 5. The IoT-23 dataset contains nine attack categories. The IoT-23 dataset is available at [
79]. We merged BoT-IoT, IoT Network Intrusion, and MQTT-IoT-IDS2020 datasets to increase the number of attack classes in the dataset. Nine attack classes and one normal class comprise the new dataset.
Table 6 describes the latest dataset known as IoT-DS-1. The latest IoT-DS-1 dataset can be found at [
79]. The BoT-IoT, IoT Network Intrusion, MQTT-IoT-IDS2020, and IoT-23 datasets were then merged. The new dataset, named IoT-DS-2, includes 15 attack classes and 1 normal class, as shown in
Table 7. The data collection IoT-DS-2 can be accessed at [
79].
3.2. Preprocessing Dataset
When features from pcap files have been extracted and analyzed, the next step is to label individual dataset instances based on specified criteria. To distinguish between regular and malicious dataset instances, each dataset has its own set of criteria for evaluating whether an instance was normal or malicious. Our proposed model can cover all IoT networks; however, the flow ID, source IP, and destination IP characteristics are unique to a particular IoT network. As a result, these features were removed from all datasets. We filled NaN values with 0 in all datasets. Redundant instances were generated by CICFlowmeter when the pcap file was converted to CSV files. These duplicate instances were removed from all datasets. After removing redundant instances, we may use previously unseen data to evaluate the model performance throughout the testing phase. We normalize the input feature columns within a specified range (−1, 1) to remove extreme values and significantly accelerate the computations. A non-numeric column is converted to a numeric column. An anomaly is represented by a value of 1, whereas a normal value is represented by 0 in binary classification. The multiclass label encoded 0 to 3 for the BoT-IoT dataset, 0 to 4 IoT for the network intrusion detection dataset, 0 to 4 for the MQTT-IoT dataset, and 0 to 9 for the IoT-23 dataset. The multiclass-encoded label in these datasets represents normal network traffic, and the rest-encoded label represents the desire attack type.
We merged BoT-IoT, IoT network intrusion, and MQTT-IoT-IDS2020 datasets to increase the number of attack classes in the dataset. The updated dataset contains nine attack classes and one normal class.
Table 6 describes the latest dataset known as IoT-DS-1. The IoT-DS-1 dataset was classified into normal and attack categories using a multiclass labeling system ranging from 0 to 9. The BoT-IoT, IoT network intrusion, MQTT-IoT-IDS2020, and IoT-23 datasets were then combined. The new dataset, named IoT-DS-2, includes 15 attack classes and one normal class, as shown in
Table 7. The IoT-DS-2 dataset was classified into normal and attack categories using a multiclass labeling scheme ranging from 0 to 15. We made class weights more distinctive to better expose the classifiers to each class since there is a clear imbalance in the training set. Google Colab Pro was used to develop the models, which included the TensorFlow framework and Keras implementations. In order to perform the classification, the data are first run through the preprocessing process and are then split into three sets: training, validation, and testing. The dataset was initially split into 80% for training and 20% for testing. The training set is then subdivided into two groups: 80% for training and 20% for validation, with each group being split in a stratified way.