Intrusion Detection System Using Feature Extraction with Machine Learning Algorithms in IoT

: With the continuous increase in Internet of Things (IoT) device usage, more interest has been shown in internet security, speciﬁcally focusing on protecting these vulnerable devices from malicious trafﬁc. Such threats are difﬁcult to distinguish, so an advanced intrusion detection system (IDS) is becoming necessary. Machine learning (ML) is one of the promising techniques as a smart IDS in different areas, including IoT. However, the input to ML models should be extracted from the IoT environment by feature extraction models, which play a signiﬁcant role in the detection rate and accuracy. Therefore, this research aims to introduce a study on ML-based IDS in IoT, considering different feature extraction algorithms with several ML models. This study evaluated several feature extractors, including image ﬁlters and transfer learning models, such as VGG-16 and DenseNet. Additionally, several machine learning algorithms, including random forest, K-nearest neighbors, SVM, and different stacked models were assessed considering all the explored feature extraction algorithms. The study presented a detailed evaluation of all combined models using the IEEE Dataport dataset. Results showed that VGG-16 combined with stacking resulted in the highest accuracy of 98.3%.


Introduction
The Internet of Things has recently been one of the most important research topics. The IoT is a new technological paradigm defined as a global network of connected electronic devices. It aims to improve daily life by automating normal daily operations in all aspects of life without human intervention. The number of devices connected to IoT has been raised significantly, and an increase in attacks against IoT devices has accompanied this growth. Security concerns about the impact of these attacks on connected devices have naturally increased. In addition to the sensitivity of the information available on IoT devices, it was necessary to find solutions to detect and respond to these attacks [1].
Because of its weaknesses, the Internet of Things is vulnerable to assaults and security threats [2][3][4][5][6][7][8][9][10][11]. Researchers attempted to categorize attacks, vulnerabilities, and security concerns on the Internet of Things so that researchers could more easily identify answers. For example, according to the layers of the IoT architecture, the researchers categorized the vulnerabilities, and physical security hardening is lacking. Unconfident data storage and transfer, shortage of clarity and device management, botnets, insecure passcodes, ecosystem interfaces, and AI-based assaults have all been concerns for devices on the IoT. While some academics emphasized IoT's vulnerabilities and security risks, others did not.
Intrusion detection systems (IDSs) are essential security techniques to conserve network security, and they are installed at a fatal location in the network [12,13]. Traditional systems contain source and preliminary processing of data and a decision-making technique. This process contains the collection of raw data from host or network traffic. By analyzing the network data traffic, an intrusion detection system can classify the network behavior as normal or abnormal [12] and then process the features passed by the decision-making method to recognize threats [13]. Three main ways to detect intrusions are signature-based IDS, anomaly-based IDS, and a hybrid of signature-and anomaly-based IDS [13]. Dynamic anomaly-based network detection systems are flexible and superior to static signature-based network intrusion systems because the former can detect new attacks [14]. They use artificial intelligence (AI) algorithms that are made of both machine learning (ML) and deep learning (DL) architectures. On the other hand, IDSs detect signatures and patterns and then match them with the predefined signature of misuses, which could be worthless with unknown attacks [13]. The three significant categories of intrusion detection systems are host intrusion detection systems (HIDSs), network intrusion detection systems (NIDSs), and network node intrusion detection systems (NNIDSs) [13]. The HIDS is installed on the entire network of machines and other parts of the physical and virtual networks and protocols. The NIDS protects vulnerable network parts where the attack opportunities are high. IDSs consider network or host-based methods to recognize and distract attacks. These methods search for attack signatures with patterns that indicate malignant action or suspicious activity. Based on where an IDS is searching for the pattern, either in network traffic or log files, it is classified as network-or host-based [15].
Machine learning methods are extensively used to build network intrusion detection systems because of their capability to grasp new intrusions [16]. To develop accurate algorithms that can cluster, classify, and predict, it is vital to utilize considerable-size data sets using supervised machine learning techniques such as SVM and naïve Bayes. In addition, decision trees demonstrate their simplicity, rapid adaptability, and accuracy. In addition, neural networks have been widely used to characterize anomaly and misuse patterns [12,16]. Accuracy and interpretability are essential factors of artificial intelligence models. To achieve accuracy and interpretability, machine learning and deep learning techniques must be considered. For example, black-box algorithms provide higher accuracy, while white-box algorithms provide feature engineering [14].

Significance of the Study and Contributions
Threats are overwhelmingly increasing in several fields, such as IoT, online banking, industries, and healthcare. Further, IoT usage has been widely accepted due to its success in wearables, smart homes, and smart cities around the world. Unfortunately, IoT devices work on public networks with bounded computing power and limited storage and bandwidth. As a result, they are more vulnerable to assaults than other end-point devices. Several techniques have been proposed in the literature, yet there is great room for improvement, especially when it comes to intrusion detection.
To overcome the issues, the contributions of the undergoing study are as follows: 1-A comprehensive review of literature on the applications of machine learning and deep learning models in intrusion detection using numerical and image-based datasets. 2-Dataset preprocessing and balancing with SMOTE technique. 3-Feature extraction using various approaches with stacked machine learning models such as kNN, sequential minimal optimization (SMO), and random forest to distinguish between malicious and normal network traffic patterns. 4-Experiment for the validation of the proposed models.
The experimental results on the IEEE Dataport image dataset reveal that the proposed techniques are promising in terms of accuracy.
The rest of the article is sectioned as follows: A comprehensive review of the related work is presented in Section 2. Section 3 describes the methodological steps, Section 4 provides the experimental results, and Section 5 concludes the paper.

Related Work
This section discusses the work done in the field of primary machine learning techniques used in IoT traffic.
Rose et al. [17] generated a dataset and developed a model to detect and investigate the possibilities of utilizing network profiling and machine learning to protect IoT against cyber-attacks. The authors suggested anomaly-based intrusion detection system profiles and monitoring all networked devices constantly and aggressively to identify IoT device tampering attempts and suspicious network transactions. They evaluated the suggested methodology's performance using regular and malicious network traffic on the Cyber-Trust testbed. The experimental findings reveal that the suggested anomaly detection system produces good results, with a 98.35% accuracy and 98.35% false-positive alerts.
Ali et al. [18] present a general machine learning strategy for identifying IoT devices and evaluating the trained models against four publicly available datasets. NFStream extracted 85 attributes from packet capture (.pcap) files to better identify IoT devices in the network using machine learning models. The authors used the information gain approach to choose 20 characteristics and trained six machine learning models in the tests. In the training phase, the authors achieved high accuracy, reaching 99% for IoT device identification using random forest and naïve Bayes classifiers.
El-Sayed et al. [19] examined and compared seven different supervised learning algorithms with various difficulty levels to pick the best one. The seven algorithms were separated into two groups: The category of CNN classifiers included two-layer CNN, four-layer CNN, VGG16 and logistic regression, support vector machine, and K-nearest neighbors, and the category of ordinary classifiers included logistic regression, support vector machine, and K-nearest neighbors. Experimental findings reveal that the SVM algorithm obtains the maximum performance of 94% on MobileNetv2 features because of its rapid and steady training performance with fewer resources compared with other models. Le K-H et al. [20] present IMIDS, an intelligent intrusion detection system (IDS) for IoT devices. IMIDS's core is a lightweight convolutional neural network model that can categorize numerous cyber threats and surpasses its competitors with an average F-measure of 97.22%. Furthermore, after being further educated by the data supplied by the assault data generator, IMIDS's detection performance significantly increased. These findings show that IMIDS may be used as an IDS in IoT.
Joo et al. [21] proposed a deep learning-based IoT intrusion detection system. The categorization was performed with a CNN; the best score was 86.2%. Second, machine learning classifiers were employed for the hybrid technique instead of ultimately linked layers from the vanilla CNN, which delivered roughly 87% with the additional tree classifier. Finally, the Xception model was merged with the bidirectional GRU, yielding the best accuracy at 95.6%. For quicker identification and classification of new malware, Bendiab et al. [22] propose a unique IoT technique that analyzes malware traffic based on DL and visual representation (zero-day malware). The suggested technique detects fraudulent network traffic at the package level, lowering detection time and optimistic outcomes thanks to the deployed deep learning. To test the efficacy of the proposed technique, the authors created a dataset of 1000 .pcap files of benign and virus traffic obtained from several network traffic sources. The Residual Neural Network (ResNet50) trial findings are quite encouraging, with a detection rate of 94.50% for malware traffic.
Six machine learning (ML) approaches were tested for their ability to identify MQTTbased attacks [23]. Packet-based, unidirectional, and bidirectional flow characteristics were evaluated at three abstraction levels. An MQTT simulated dataset was created and used for the training and assessment operations. The experimental findings showed that the suggested ML models were sufficient for the IDS needs of MQTT-based networks. Furthermore, the findings highlight the significance of employing flow-based characteristics to distinguish MQTT-based attacks from innocuous traffic, whereas packet-based features are sufficient for typical networking assaults. The results reveal that the model has the highest accuracy of 99.04%. Sapre et al. [24] employed the KDDCup99 and the NSLKDD, two widely used intrusion detection datasets, in their study. Their major objective was to thoroughly compare both datasets by analyzing the performance of multiple machine learning (ML) classifiers trained on them using a more extensive range of classification criteria than prior studies. Because the classifiers trained on the KDDCup99 dataset were 20.18% less accurate on average, the authors concluded that the NSL-KDD dataset is of better quality than the KDDCup99 dataset. This is because classifiers trained on the KDD-Cup99 dataset were biased toward redundancy, allowing them to attain a higher accuracy of 96.83%. Liu et al. [25] looked at assaults that might affect sensors and networks in IoT scenarios using the NSL-KDD dataset. Moreover, the authors investigated eleven machine learning techniques and provided the findings to identify the introduced assaults. They showed that tree-based approaches and ensemble methods surpass the other machine learning methods evaluated through numerical analysis. With 97% accuracy, 90.5% Matthews correlation coefficient (MCC), and 99.6% area under the curve (AUC), XGBoost is the best of the supervised algorithms. Furthermore, the expectation-maximization (EM) technique, which is an unsupervised approach, performs exceptionally well in identifying assaults in the NSL-KDD dataset and beats the naïve Bayes classifier by 22.0% in terms of accuracy.
To distinguish benign from malicious nodes, Amouri et al. [26] used a methodology that consists of two stages: in the first stage, the data are collected by dedicated sniffers (DSs), and then the CCI is generated and is regularly sent to the super node (SN). After that, in the second stage, the SN processes a linear regression method on the collected CCIs from different DSs to distinguish benign from malicious nodes. Using two mobility models, namely random waypoint (RWP) and Gauss Markov, the detection characterization is shown for several extreme cases in the network (GM). The black hole and distributed denial of service (DDoS) assaults are two harmful activities utilized at work. Nodes with high-velocity situations showed detection rates of over 98%, while nodes with low-velocity scenarios showed detection rates of approximately 90%. Fenanir et al. [27] created a lightweight intrusion detection system (IDS) using two machine learning techniques: the filter-based method was used to pick features due to its cheap computational cost. A comparison of logistic regression (LR), naïve Bayes (NB), decision tree (DT), random forest (RF), k-nearest neighbor (KNN), support vector machine (SVM), and multilayer perceptron yielded the feature classification approach to the system (MLP). Finally, the DT method was chosen for the system due to its excellent performance across various datasets. The study's outcomes might help choose the optimum feature selection approach for machine learning; the data suggest that the best results are 98% accuracy.
Islam et al. [28] pointed out numerous types of IoT threats and discussed shallow IDSs in the IoT environment (such as decision tree (DT), random forest (RF), and support vector machine (SVM)), as well as DL (deep neural network (DNN), deep belief network (DBN), long short-term memory (LSTM), stacked LSTM, and bidirectional LSTM (Bi-LSTM))based IDSs. The models' execution was assessed using five standard datasets: NSL-KDD, IoTDevNet, DS2OS, IoTID20, and the IoT Botnet dataset. The performance of shallow/deep machine learning-based IDSs was evaluated using several performance indicators such as accuracy, precision, recall, and F1-score. According to the research, a machine learning IDS surpasses shallow machine learning in detecting IoT threats; the most remarkable outcome of the studies is the accuracy of 98.79%. Using characteristics from the UNSW-NB15 dataset, Ahmad et al. [29] suggest feature clusters regarding its flow, Message Queuing Telemetry Transport (MQTT), and Transmission Control Protocol (TCP). Overfitting, the curse of dimensionality, and an unbalanced dataset are no longer issues. The proposed method used supervised machine learning (ML) methods such as random forest (RF), support vector machine, and artificial neural networks on the clusters. The model reaches 98.67% and 97.37% accuracy using RF in binary and multiclass classification. Utilizing RF on flow and MQTT features, TCP features, and top features from both clusters, classification accuracies of 96.96%, 91.4%, and 97.54% were obtained using cluster-based approaches. A two-stage hybrid technique was proposed by Saba et al. in [30]. To increase the accuracy of the suggested system, the genetic algorithm (GA) is first used to pick acceptable characteristics. The support vector machine (SVM), ensemble classifier, decision tree, and other well-known machine learning (ML) algorithms are then used. Using the NSL-KDD database, they attained a 99.8% accuracy using 10-fold cross-validation. Based on a hybrid convolutional neural network model, Smys et al. [31] suggested an intrusion detection system for IoT networks that can identify many forms of assaults. The proposed paradigm may be used in a variety of IoT scenarios. The proposed study is validated and compared to machine learning and deep learning models. The suggested hybrid model is more sensitive to threats in the IoT network, with a 98.6% accuracy rate. Papafotikas et al. [32] propose a digital system incorporating a machine learning (ML)-based clustering method for identifying suspected activities while using current supply characteristic dissipation. The K-means clustering algorithm accompanied by supervised training is used in this prototype system. This research demonstrated the successful identification of suspicious activity in intelligent IoT devices. Similarly, a study in [33] proposed an IDS approach using a fused machine learning model. Three datasets, namely KDD, CUP-99, and NetML-2020, were fused under a novel-built machine learning-based architecture. The trained model was promising in terms of accuracy of 95.18%.
Further, several researchers in the literature have comprehensively surveyed and emphasized the significance of machine learning and deep learning models in the IDSs involving IoT networks [34][35][36][37], especially in conjunction with cloud computing, namely the Cloud of Things security aspect [38]. This is mainly because it involves several intermediate public networks and stakeholders, making it more vulnerable to attacks. Table 1 summarizes related work approaches, including the techniques used, dataset type, and the respective study's advantages and disadvantages.

Methodology
This section includes the methodology, an overview of the dataset, data preprocessing, and a brief description of the algorithms and techniques used for feature extractions. The parameters used to evaluate the models are then presented. The objective of the modeling presented in this work is to distinguish regular traffic from malicious traffic. Therefore, several models are developed on the obtained dataset, in image format, and compared. Figure 1 illustrates an example of a stacked model with multiple feature extractors. Firstly, the data are pre-processed to improve model accuracy as further discussed in the subsequent section. Different feature extractors are then applied to extract relevant features; individual and multiple feature extractors are used to facilitate this. Finally, different machine learning algorithms are trained to classify the traffic.

Dataset Description
The dataset utilized in this research was obtained from IEEE Dataport [39]. The dataset contains more than 800 samples of normal and malicious traffic in binary visualization format for model training. It is a benchmark dataset for intrusion detection systems in the image format. Due to the rich visual features, it has more significance than a numerical dataset. Furthermore, additional data are also provided in image format, generated from the five attack scenarios presented in [39].

Dataset Description
The dataset utilized in this research was obtained from IEEE Dataport [39]. The dataset contains more than 800 samples of normal and malicious traffic in binary visualization format for model training. It is a benchmark dataset for intrusion detection systems in the image format. Due to the rich visual features, it has more significance than a numerical dataset. Furthermore, additional data are also provided in image format, generated from the five attack scenarios presented in [39]. Figure 2 provides examples of normal and malicious traffic packages in image format. It is clear from the examples provided that two images of normal or malicious traffic can be significantly different and, in certain instances, maybe like the other category. Therefore, by adopting machine learning techniques, there is an opportunity to differentiate between the two categories with high accuracy.

Dataset Description
The dataset utilized in this research was obtained from IEEE Dataport [39]. The dataset contains more than 800 samples of normal and malicious traffic in binary visualization format for model training. It is a benchmark dataset for intrusion detection systems in the image format. Due to the rich visual features, it has more significance than a numerical dataset. Furthermore, additional data are also provided in image format generated from the five attack scenarios presented in [39]. Figure 2 provides examples of normal and malicious traffic packages in image format. It is clear from the examples provided that two images of normal or maliciou traffic can be significantly different and, in certain instances, maybe like the othe category. Therefore, by adopting machine learning techniques, there is an opportunity to differentiate between the two categories with high accuracy.

Data Preprocessing
Data preprocessing is a crucial element before applying any machine learning model. It is required to address any inconsistency, errors, or noise in the data [40]. The model's performance can be significantly impacted if the data are poorly preprocessed. Data preprocessing consists of several steps: cleaning, transformation, and feature selection. For image preprocessing, filters are deployed to denoise the dataset. Several filters have been developed and are widely used. In this research, the performance of the models utilizing different filters is considered as discussed in the following sections.

Synthetic Minority Oversampling Technique Filter
Once the dataset is cleaned, it is essential to ensure that it is balanced. Imbalanced datasets might have a significant impact on the performance of the overall model. One way to balance the dataset is to reduce the number of instances of all classes to match the number of instances of the class with the lowest number of instances. However, this will reduce the number of training data, affecting the training process. Another approach is using the synthetic minority oversampling technique (SMOTE) approach to handle the imbalanced dataset [41].
SMOTE is a technique widely used to oversample the minority class. The oversampling is performed by generating more synthetic examples of the minority class throughout the length of the line segments connecting some/all the minority class nearest neighbors. Figure 3 illustrates how the newly generated samples w 1 . . . w 4 are generated between the existing data under the minority class y 1 . . . y 4 [41].
way to balance the dataset is to reduce the number of instances of all classes to mat number of instances of the class with the lowest number of instances. However, th reduce the number of training data, affecting the training process. Another appro using the synthetic minority oversampling technique (SMOTE) approach to hand imbalanced dataset [41].
SMOTE is a technique widely used to oversample the minority class oversampling is performed by generating more synthetic examples of the minority throughout the length of the line segments connecting some/all the minority class n neighbors. Figure 3 illustrates how the newly generated samples w1…w4 are gene between the existing data under the minority class y1…y4 [41]. The SMOTE algorithm is employed using the following steps: • For each sample, find the k-nearest neighbors. • Then, select a random sample from the k-nearest neighbor.

•
Then define the new samples as original samples plus the difference betwee nearest neighbor multiplied by a random number between 0 and 1 (new sam original samples + difference × random (0-1)).

•
Add the newly generated samples to the minority.

Feature Extraction
Feature extraction in image processing reduces data dimensions while obtaini relevant information from the original data to improve the classification model acc and maximize the recognition rate. Feature extraction is performed by extracting re data, characterizing classes, and storing them in feature vectors to be inputted in machine learning algorithm [42].
Transfer learning models are pre-trained models on a vast dataset image datas utilized as a feature extraction method, allowing the transfer of pre-gained know The SMOTE algorithm is employed using the following steps:

•
For each sample, find the k-nearest neighbors. • Then, select a random sample from the k-nearest neighbor.

•
Then define the new samples as original samples plus the difference between the nearest neighbor multiplied by a random number between 0 and 1 (new sample = original samples + difference × random (0-1)).

•
Add the newly generated samples to the minority.

Feature Extraction
Feature extraction in image processing reduces data dimensions while obtaining the relevant information from the original data to improve the classification model accuracy and maximize the recognition rate. Feature extraction is performed by extracting relevant data, characterizing classes, and storing them in feature vectors to be inputted into the machine learning algorithm [42].
Transfer learning models are pre-trained models on a vast dataset image dataset and utilized as a feature extraction method, allowing the transfer of pre-gained knowledge. For a small dataset, training a model from scratch will result in low performance due to overfitting. Several pre-trained models based on convolutional neural network (CNN) architectures were developed to resolve this issue, such as VGG-16, VGG-19, DenseNet, and multilayer perceptron (MLP). These pre-trained models can be fine-tuned and used as feature extractors [42].
Moreover, Visual Geometry Group (VGG) models can extract features from images. Two VGG models were developed at Oxford based on two CNNs with 16 and 19 layers, widely known as VGG-16 and VGG-19. These CNN models accept input of 224 by 224 pixels in RGP format. The first layer consists of 64 neurons, and the number of neurons increases by a factor of 2, reaching 512 neurons at the last layers [43].
DenseNet is a CNN pre-trained model like the Visual Geometry Group (VGG) models. However, it requires fewer parameters to remove unnecessary feature maps due to feature reuse. In DenseNet, as shown in Figure 4, all layers are connected, not only adjacent layers, as in other CNN architectures. This allows features to be mapped to other layers without the need for replications, reducing the number of parameters. Moreover, connecting all layers resolves the vanishing-gradient problem, resulting in higher performance [44]. models. However, it requires fewer parameters to remove unnecessary feature maps due to feature reuse. In DenseNet, as shown in Figure 4, all layers are connected, not only adjacent layers, as in other CNN architectures. This allows features to be mapped to other layers without the need for replications, reducing the number of parameters. Moreover, connecting all layers resolves the vanishing-gradient problem, resulting in higher performance [44].

Image Filter
Image filters are also widely used to map image features into feature space for input to the training model while ensuring that the features are sufficiently descriptive of the class. In this work, we use two filters: auto-color correlogram filter and fuzzy color and texture histogram (FcTH) Filter. The auto-color correlogram filter, unlike color histograms, which only describe the color distribution of an image, expresses how spatial correlation among colors varies with distance. However, the absence of spatial information might lead to false predictions. It is difficult for a histogram to distinguish the difference between both images since they have similar color contexts. However, a correlogram will distinguish the difference clearly due to the spatial information [45]. The fuzzy color and texture histogram (FcTH) filter aims to map the visual features of an image to feature space while ensuring that the features are sufficiently descriptive of the

Image Filter
Image filters are also widely used to map image features into feature space for input to the training model while ensuring that the features are sufficiently descriptive of the class. In this work, we use two filters: auto-color correlogram filter and fuzzy color and texture histogram (FcTH) Filter. The auto-color correlogram filter, unlike color histograms, which only describe the color distribution of an image, expresses how spatial correlation among colors varies with distance. However, the absence of spatial information might lead to false predictions. It is difficult for a histogram to distinguish the difference between both images since they have similar color contexts. However, a correlogram will distinguish the difference clearly due to the spatial information [45]. The fuzzy color and texture histogram (FcTH) filter aims to map the visual features of an image to feature space while ensuring that the features are sufficiently descriptive of the class. Like the auto-color correlogram filter, the FcTH filter uses and combines color and texture information of images. A fuzzy system produces a fuzzy linking histogram which forms several pins representing different image colors [46]. FcTH consists of three fuzzy units. The first fuzzy unit produces a hue saturation value (HSV) color space in 10 bins. The second fuzzy unit expands the 10 bins to 24 bins and then to 192 bins in the third unit. Then the 192-bin histogram is mapped into eight regions in the interval 0-7 using the Gustafson-Kessel fuzzy classifier [47].

Auto-Color Correlogram Filters
Several models were tested using different filters, individual algorithms, and stacked models to obtain the most accurate results. The first four models were developed using the auto-correlogram filter. Three models were based on individual learning algorithms, namely KNN, SMO, and random forest, and a stacked model of both KNN and SMO was also developed, as shown in Figure 5.

Auto-Color Correlogram Filters
Several models were tested using different filters, individual algorithms, and stacked models to obtain the most accurate results. The first four models were developed using the auto-correlogram filter. Three models were based on individual learning algorithms, namely KNN, SMO, and random forest, and a stacked model of both KNN and SMO was also developed, as shown in Figure 5.  Table 2 below shows these four models' accuracy, precision, recall, and F1-score. As shown in the table, KNN with k = 1 has the highest accuracy of 97.5% with a precision of 98.0% when data are split 90% for training and 10% for testing. When the training set is smaller, 70% to 30%, the random forest has the highest accuracy of 92.2% with a precision of 94.0%.

Auto-Color Correlogram and FcTH Filters
An additional five models were developed using auto-correlogram and FcTH filters. Three models were based on individual learning algorithms, like the first four models mentioned above, in addition to two stacked models, as shown in Figure 6.

Auto-Color Correlogram and FcTH Filters
An additional five models were developed using auto-correlogram and FcTH filters. Three models were based on individual learning algorithms, like the first four models mentioned above, in addition to two stacked models, as shown in Figure 6.   Table 3 presents the accuracy of these five models. As shown in the table, both random forest and the stacked model with Jrip and random forest as base-level classifiers and SMO as meta classifiers have the highest accuracy of 97.5% and precision of 96.2% when data are split 90% for training and 10% for testing. However, the RF model has a higher recall of 100% compared to the stacked model, with 98.1%. When the training set is smaller, 70% to 30%, the random forest has the highest accuracy of 95%.

DenseNet Transfer Model
Four additional models were developed using the DenseNet transfer model. Three of these models were based on individual learning algorithms, like the first four models mentioned above, in addition to a stacked model, KNN and SMO, and KNN as the meta classifier, as shown in Figure 7.

DenseNet Transfer Model
Four additional models were developed using the DenseNet transfer model. Three of these models were based on individual learning algorithms, like the first four models mentioned above, in addition to a stacked model, KNN and SMO, and KNN as the meta classifier, as shown in Figure 7.   Table 4 below shows the accuracy of these four models. As shown in the table, the random forest model has the highest accuracy of 96.6% with a precision of 92.9% when data are split 90% for training and 10% for testing. When the training set is smaller, 70% to 30%, KNN, with a k value is 5, has the highest accuracy of 94.4% with a precision of 91.7%.

VGG16 Transfer Model
Then four additional models were developed using VGG-16 as the transfer model. These four models are like the DenseNet models regarding training algorithms, as shown in Figure 8.

VGG16 Transfer Model
Then four additional models were developed using VGG-16 as the transfer model. These four models are like the DenseNet models regarding training algorithms, as shown in Figure 8.  Table 5 shows the accuracy of these four models. The table shows that the stacked model has the highest accuracy of 98.3% with a precision of 96.3% when data are split 90%  Table 5 shows the accuracy of these four models. The table shows that the stacked model has the highest accuracy of 98.3% with a precision of 96.3% when data are split 90% for training and 10% for testing. When the training set is smaller, 70% to 30%, KNN, with a k value of 5, has the highest accuracy of 95.8% with a precision of 95.1%.

Comparison of Different Feature Extractors
This section will evaluate the proposed models based on the feature extractors. For cross-validation of 10, auto-color correlogram and FcTH filters resulted in the highest accuracy of 94.7% when combined with random forest, with a precision of 93.5%, as shown in Table 6.  Table 7. VGG-16 has also resulted in the highest accuracy for a 90% to 10% data split. VGG-16 has resulted in an accuracy of 98.3% and precision of 96.3% when combined with the stacked model, as shown in Table 8.

Analysis of the Results
In this work, seventeen models have been evaluated with different feature extractors and different classification algorithms. The models were evaluated based on accuracy, precision, recall, and F1-score., with more emphasis on accuracy and precision. The objective is to have the highest accuracy with the highest precision. The objective of having the highest precision is to ensure that minimum malicious traffic is wrongly classified as normal jeopardizing network security.
The highest precision and accuracy are achieved when VGG-16 combines the stacked model, KNN and SMO, and KNN as the meta classifier with k = 3, for 90% to 10% data split. Therefore, this model was selected as the best model.

Conclusions
The Internet of Things is a new technological paradigm that aims to improve daily life by automating normal daily operations in all aspects of life without human intervention. With the continuous increase in Internet of Things (IoT) device use, more interest is shown in internet security, specifically focusing on protecting these vulnerable devices from malicious traffic. Such threats are difficult to distinguish, so advanced detection systems are becoming necessary.
This study aimed to develop a model with the highest performance in distinguishing malicious from normal traffic. Various feature extraction techniques and machine learning algorithms were used to achieve the study's objectives. The experiments show that feature extraction techniques are important for attaining high performance. Moreover, VGG-16 transfer proved to give the highest accuracy and precision. This study investigated the effect of individual and stacked machine learning algorithms. It also investigated the impact of the data split ratio on the execution of the models. The conducted experiments showed that the stacked model achieved the highest accuracy when combined with the VGG-16 transfer model, achieving an accuracy of 98.3%.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.