A Hierarchical Deep Learning-Based Intrusion Detection Architecture for Clustered Internet of Things

: The Internet of Things (IoT) system’s ever-expanding attack surface calls for a new intrusion detection system (IDS). These systems may include thousands of wireless devices that need to be protected from cyberattacks. Recent research efforts used machine learning to analyze and identify various attacks and abnormal behavior on IoT systems. Most of these techniques are characterized by low accuracy and they do not scale to today’s IoT-enabled smart cities applications. This article proposes a secure automatic two-levels intrusion detection system (SATIDS) which utilizes the minimum redundancy maximum relevance (MRMR) feature selection technique and an enhanced version of long short-term memory (LSTM) based on an artiﬁcial recurrent neural network (RNN) to enhance the IDS performance. SATIDS aims at detecting trafﬁc anomalies with greater accuracy while also reducing the time it takes to perform this task. The proposed algorithm was trained and evaluated using two of the most recent datasets based on realistic data: ToN-IoT and InSDN datasets. The performance analysis of the proposed system proves that it can differentiate between attacks and normal trafﬁc, identify the attack category, and ﬁnally deﬁne the type of sub-attack with high accuracy. Comparing the performance of the proposed system with the existing IDSs reveals that it outperforms its best rivals from the literature in detecting many types of attacks. It improves accuracy, detection rates, F1-score, and precision. Using 500 hidden and two LSTM layers achieves accuracy of 97.5%, precision of 98.4%, detection rate of 97.9%, and F1-score of 98.05% on ToN-IoT dataset, and precision of 99%, detection rate of 99.6%, and F1-score of 99.3% on InSDN dataset. Finally, SATIDS was applied to an IoT network which utilizes the energy harvesting real-time routing protocol (EHRT). EHRT optimizes the low-energy adaptive clustering hierarchy (LEACH) routing technique using a modiﬁed artiﬁcial ﬁsh swarm algorithm. The integration between the optimized LEACH and the proposed IDS enhances the network lifetime, energy consumption, and


Introduction
IoT is a powerful platform for connecting everyday objects and users worldwide without human interference.It links devices with the Internet enabling smart identification, tracing, positioning, administration, and monitoring of objects and people.It has several uses including healthcare, transport, and smart grids.Despite its many advantages, IoT networks are susceptible to cyber-attacks which affect their operation, stability, and energy consumption.To address some of the IoT security issues, an intrusion detection system (IDS) is used to detect traffic abnormalities, protect the network from attacks, and reduce functional and financial losses by alerting the network administrator about those behaviors.An IDS system may use signature rules, anomaly-based rules, or a combination of these to increase its intrusion detection performance [1][2][3].
In the last decade, IDSs started to utilize machine learning (ML) or deep neural network (DNN) techniques to increase their attack-detection efficiency.The more extensive DNN networks are called deep learning (DL), which consider an ML border subfield.DL contains many hidden layers to solve many of the complicated problems in various applications.The input of each layer is the output of the previous layer.Although DL results in high performance, its main limitation is a longer training time for more essential training data.Convolution neural network (CNN) and recurrent neural networks (RNN) are two popular types of DL models.CNN is used in computer image processing and language processing while RNN is used in the processing of natural language and text processing.Long short-term memory (LSTM) network is considered an extension of the RNN network which is cable of performing the line sequences learning pattern.LSTM can be used to identify knowledge as normal traffic or attack.The primary benefit of this form of DL is that the raw data is used directly [3][4][5].

Problem Statement
Based on previous research, we conclude that IoT applications face numerous securityrelated concerns and obstacles.With the proliferation of cyberattacks in the IoT environment, the entire system may become susceptible to assault.The primary obstacles are outlined below: -Due to the dynamic nature of IoT networks such, as IoT devices, fog, and cloud, designing a distributed security framework for distributed IoT applications is a difficult issue.Additionally, the growth of attacker behavior can intercept the IoT transmission network.-Distributed analysis of the massive amounts of data (big data) produced by IoT devices is a complex problem that requires a security mechanism IoT that uses the right analytical tool in a distributed working architecture.-As datasets grow in size and complexity, maintaining network security becomes increasingly difficult due to the increasing computational complexity involved in measuring the dataset parameters.There are fewer intrusions than non-intrusions, making it more challenging to train for the latter [6,7].-Most IoT devices are powered by batteries.Energy is consumed during data collection, processing, and packet transmission or reception, hence an intrusion deduction system that limits overall network energy consumption is necessary [8][9][10].

Our Contribution
The main contribution of this research is summarized as follows: -An efficient IDS which utilizes minimum redundancy maximum relevance (MRMR) feature selection technique and an enhanced version of long short-term memory (LSTM) based on an artificial recurrent neural network (RNN) is proposed.The proposed IDS system performs well in terms of detecting traffic anomalies with greater accuracy while also reducing the time it takes to perform this task.Minimum redundancy maximum relevance (MRMR) feature selection method is exploited to reduce the dataset dimension and size by selecting only the most essential features.It helps training IDSs more quickly with low computation and high accuracy because it removes irrelevant features to maximize system accuracy.- The performance of the proposed IDS system was analyzed using two of the most recent datasets based on realistic data: ToN-IoT and InSDN datasets [11,12].- The proposed MRMR-SATIDS was applied to a cluster based IoT network which uses an optimized hierarchical routing technique to address the energy consumption issue and conserve devices energy.-Finally, the performance of the proposed system was compared with and without using an MRMR feature selection technique, then was compared to other ML IDSs.
The remainder of the paper is structured as follows: Contributions of other researchers to the IDS field are presented in Section 2. Section 3 describes the methods and methodology used in this work.The proposed system is presented in Section 4. Performance analysis of the proposed system and a comparison to existing systems are presented in Section 5. Section 6 concludes the article and presents future work avenues.

Related Work
Deep learning (DL) helps researchers and cyber analysts gain insights into threats and stay ahead of opponents.Attack detection continues to be an active area of research that receives considerable attention due to the wide attack vector and the possible impact of attacks on IoT systems.Many research efforts have been carried out to detect network attacks using DLs.Using network traffic analysis, an SDN-based intrusion detection system that is executed as an application module on the controller was developed in [3].In [4], The Bot-IoT dataset was used to train DRNN and SMOTE-DRNN models, and the simulation results show that high-class imbalance in the training data has a negative impact on the DRNN model's precision, recall, F1 score, area under the receiver operating characteristic curve (AUC), geometric mean (GM), and Matthew's correlation coefficient (MCC).In [5], the authors had recommended adopting Blockchain to protect FL algorithms in Internet of Things (IoT) devices from attacks.Integrating Blockchain and FL ensures the integrity of trained models, preventing model poisoning attacks.In [6], the authors had suggested a clustering method based on unsupervised feature selection and cluster center initialization for intrusion detection.This method computes initial cluster centers using sets of semiidentical examples that signal dense data space and avoids outliers.A model was created in [7] that extracts important features from the given features and then classifies incursions using a deep learning approach.Notably, underlying data points cannot be considered samples from a single distribution, but rather from two distinct distributions-one generic to all network intrusions and the other domain-specific.A deep learning-based deep belief network (DBN) algorithm was advocated in [8] for intrusion detection systems.The survey presented in [9] offers a thorough analysis of LEACH descendant clustering procedures.This is the first study to categorize LEACH-based routing protocols into CH selection, data transmission, and combined CH selection and data transmission methods.
The authors in [10] suggested a cyberattack detection framework based on ensemble learning and fog-cloud architecture for IoMT networks.The ensemble design incorporates the individual learners: decision tree, naive Bayes, and random forest.Training the models with just examples of normal classes, a new method is presented in [13] based on long short-term memory (LSTM) autoencoder and one-class support vector machine (OC-SVM) to detect anomaly-based assaults in an unbalanced dataset.Using a variety of machine learning algorithms and ensemble learning, [14] compares several of these intrusion detection systems.Using a random forest regressor technique, a subset of attributes (significant) is selected from the primitive set of attributes, and this subset is then utilized to train different classifiers.Table 1 shows the most significant existing IDS systems.[19] The suggested model extracts the features of recurrent model hidden layers and then uses a kernel-based principal component analysis (KPCA) feature selection strategy to find the best features.Finally, the best features of the recurrent models are combined, and classification is performed with an ensemble meta-classifier SDN-IoT 99% in network attacks detection 97% network attacks classification

Database Description
The performance of an AI-powered IDS is dependent on the quality of the dataset used for its training.The lack of an up-to-date, real-world generated dataset is one of the key obstacles in training effective AI-based IDS.The main reason that public data sets are not available for the intrusion detection area is for privacy and legal considerations.To verify the proposed system's performance in two different environments, it was trained and evaluated using two of the most recent realistic datasets, namely ToN-IoT and InSDN datasets which are introduced in the following subsections.These datasets were selected to test the system robustness and generalization and to say with a certain confidence that the proposed IDS behaves well in both SDN and IoT environments.The ToN-IoT dataset simulates an IoT environment with 43 features and nine types of attacks, while the InSDN dataset enables some cyberattacks using OpenFlow protocols in the SDN architecture, with 79 features and eight types of attacks.

ToN-IoT Dataset
The ToN-IoT dataset is obtained from a practical and large-scale network developed by the UNSW Canberra Cyber IoT Lab [11].The dataset contains a range of routine and cyberattack events from IoT networks.A newer testbed was built at the same IoT lab to connect a range of virtual computers, physical tools, hacking systems, cloud and fog systems, and IoT sensors to simulate the functionality and scalability of the automotive IoT and Enterprise 4.0 networks.This generated dataset contains different recent smart city-based attacks such as DoS, DDoS, and ransomware.These attacks have been deployed against web applications, IoT gateways, and computer systems across the IoT network.
This dataset contains 43 features with 461,043 total observations divided into 300,000 normal and 161,043 attack observations.The dataset was divided into 70% and 30% for train and test sets, respectively.Table 2 shows the statistic of normal traffic and different attack vectors as presented in the dataset [10].The InSDN dataset covers many scenarios and attack classes, including Probe, DoS application, web attacks, brute force attack, password guessing, U2R, and DDos attacks [12].In addition, InSDN regular traffic also encompasses several common features.The dataset source of attacks originates from both an internal and an external network to imitate the actual attack scenarios.It covers about 80 statistical aspects in CSV format such as protocol, duration, byte number, and packet number.The overall number of dataset instances for normal and attack traffic is 343,939, a total of 68,424 for normal traffic, and 275,515 for attack traffic as shown in Table 3.The attack traffic in InSDN emulates the same normal behavior since normal and malicious traffics are transmitted to the SDN controller for decision-making.Furthermore, the centralized perspective of the SDN network and the separation of the data plane from the control plane gives the attacker a new option compared to the traditional network of numerous attacks.These intrusions are not easy to be detected since the intruder is permitted to access to the victims' server.As a result, such a data collection might be a good indicator for the model assessment that reflects the real-world scenario.Furthermore, the InSDN dataset contains no redundant records to prevent the learner model from distracting itself from the most common records [12,13,15].

LSTM Network
DNN is a type of neural network which has a hierarchical structure with many hidden layers.The network becomes denser when hidden layers increase.DNN has become trendy among cybersecurity researchers in the last few years.LSTM is a RNN network that uses the input information to predict the next sequence according to the detected pattern.The recurrence of the network delay, which represents the complex system output, is the RNN's most essential characteristic.Furthermore, RNNs maintain an activation vector that makes the RNN an intense neural network.However, due to explosion and gradient difficulties, it is also difficult to teach RNNs to rely on data from time series for a long time [20].
The LSTM architecture makes a good choice for solving the RNN extinction gradient problem.LSTM uses a memory cell, which in sequential data may represent long-term dependencies.Figure 1 shows the internal description of the LSTM memory cell.The four internal memory gates are responsible for regulating the interlinkage between memory units.The input gate decides whether the input signal will alter the memory cell's status or not.On the other hand, the output gate determines which memory status can be changed depending on the previous input.The forgotten gate can decide to forget its status (or remember it), under the following equations, at time t:

MRMR-Based Feature Selection
To deal with the high dimensionality of huge datasets, feature selection (FS) approaches are used.FS aids an IDS in both increasing its output and decreasing the amount of time required to train its model.To create better IDS models, a subset of the original features is chosen.It is possible to categorize the methods by the level of supervision that

MRMR-Based Feature Selection
To deal with the high dimensionality of huge datasets, feature selection (FS) approaches are used.FS aids an IDS in both increasing its output and decreasing the amount of time required to train its model.To create better IDS models, a subset of the original features is chosen.It is possible to categorize the methods by the level of supervision that is available: supervised, unsupervised, and semi-supervised.Filtering, wrapping, and embedding are all examples of different FS approaches [21,22].
MRMR-FS is one of the most popular FS methods available.Bioinformatics and multimedia processing are only two examples of the many fields where it is widely employed.Using mutual information as an indicator of reliance, it is a forward-selection filter approach.In MRMR, features are picked based on their relevance to the target variable while also minimizing the correlation between them [22,23].
MRMR-FS detects the most effective features as follows: The highest relevance feature max x∈Ω V x is selected and added to an empty feature set, S, which is calculated using Equations ( 6) and (7).
where x and y are two random variables.C: the output M I : Mutual information of feature x and y.M(C, x i ) : Mutual information of feature x i with class C.

II
Nonzero relevance and redundancy features are chosen in both S and Sc.In case Sc does not contain a nonzero relevance and zero redundancy feature, go to Step IV.III Select the largest relevance feature, max x∈S c ,w x =0 V x and add the selected feature to set S.
IV For all features in set Sc, repeat Step II until nonzero redundancy occurs.V Choose nonzero relevance and nonzero redundancy feature with the largest mutual information quotient (MIQ) value within Sc and add that feature to S set using Equation (8).
where x and y are two random variables and I(x, y) is the mutual information between x and y.

VI
Step IV should be repeated until there is no longer any relevancy for any feature Sc.VII Zero relevance features are added randomly to S.

Modified Artificial Fish Swarm Algorithm (AFSA)
AFSA is one of the best swarm intelligence algorithms for optimization.Fish schooling and swarm intelligence inspired the development of population-based evolutionary computing techniques.Random AFSA solves optimization issues based on the movement and intelligence of swarms in the process of finding food.In order to solve the optimization challenge, random and parallel searching algorithms are used to mimic fish behavior such as following and swarming in order to find the best possible solution for the entire population as in Figure 2. In AFSA, fish have three behaviors in eating: preying, swarming, and following [24].and intelligence of swarms in the process of finding food.In order to solve tion challenge, random and parallel searching algorithms are used to mimic such as following and swarming in order to find the best possible solution population as in Figure 2. In AFSA, fish have three behaviors in eating: pre ing, and following [24].The three possible behaviors of an AF are described in the following:

•
Preying behavior: A fish's basic activity is to prey on other fish in ord to an area where there is more food available.This behavior is mod using an AF's current position and the area immediately surrounding i  ⃗ =  ( ) ⃗ + (.(0,1).
• Following behavior: When an AF locates a spot where there is a greater of food, other fish will follow it, as shown by the following equation.
• As the number of artificial fish (AFs) increases, AFSA's convergence sp racy are improved.However, as the number of AFs rise, so does the algo plexity.As the number of iterations increases, so does the effect of an function value on the ultimate optimal outcome in AFSA, regardless of The three possible behaviors of an AF are described in the following: • Preying behavior: A fish's basic activity is to prey on other fish in order to relocate to an area where there is more food available.This behavior is modeled in AFSA using an AF's current position and the area immediately surrounding it. → where → x i ).
• Following behavior: When an AF locates a spot where there is a greater concentration of food, other fish will follow it, as shown by the following equation.
• As the number of artificial fish (AFs) increases, AFSA's convergence speed and accuracy are improved.However, as the number of AFs rise, so does the algorithm's complexity.As the number of iterations increases, so does the effect of an AF's fitness function value on the ultimate optimal outcome in AFSA, regardless of whether that value is too large or too small.As a result, the algorithm complexity is increased by the use of these weak fish.For maximum or minimum optimization, if a weak AF's fitness function value exceeds or falls below a defined threshold, it will be swallowed in the AFSA computations [25,26].

Energy Harvesting Real-Time Routing Protocol (EHRT)
EHRT depends on the principle of clustering using LEACH routing protocol which is optimized by a modified AFSA that modifies the fitness function of AFSA [9,[25][26][27].EHRT protocol consists of the following two phases:

•
Phase I: Cluster Formation and CH Election In this step, clusters are established and cluster heads (CHs) are selected depending on the clustering algorithm's guiding principle.Equal segmentation of the area space is utilized to form clusters.Each cluster's CH is selected based on the node that has the highest energy and lowest distance from the base station (BS).
As all sensor nodes initially have the same energy, in our architecture, the node in the cluster's center is designated as its center of neighboring nodes.The processes of cluster creation and CH election are shown in Figure 3.After this election, the BS runs the modified AFSA (phase II) to select the optimal CHs set at the beginning of each round.

•
Phase II: Optimal CH Selection by Using modified AFSA As CHs spend significant amount of energy during the data collection and transfer phase, EHRT rotates the CH role among sensor nodes for each cluster using modified AFSA at the start of each transmission round.As a result, energy waste is being avoided and the energy usage of each node is being optimized to extend the network's lifetime.Figure 4 depicts the CH selection procedures.

The Proposed MRMR-SATIDS System
The main objective of any IDS system is to improve the accuracy of attack detection and discovering the attack type.This section provides a detailed description of the proposed SATIDS system which depends on both the MRMR FS technique and a LSTM network for the classification purpose.Then, SATIDS is applied to a WSN after its training and testing to study its performance when dealing with large number of mobile sensor nodes and its effect on the energy consumption.
Figure 5 presents the inner structure of the proposed SATIDS system.Rectified linear unit (Relu layer) activation function performs a threshold operation to an input element concerning zero as some input data contain negative values.Relu layer passes the positive input and gives zero value to the negative values.Then, a number of LSTM layers are used; each layer is followed by a 0.2 probability dropout layer.The dropout layer probability value is chosen to be 0.2 using a trial and error method to give the highest overall performance metrics as presented in Figure 6.

The Proposed MRMR-SATIDS System
The main objective of any IDS system is to improve the accuracy of attack detection and discovering the attack type.This section provides a detailed description of the proposed SATIDS system which depends on both the MRMR FS technique and a LSTM network for the classification purpose.Then, SATIDS is applied to a WSN after its training and testing to study its performance when dealing with large number of mobile sensor nodes and its effect on the energy consumption.
Figure 5 presents the inner structure of the proposed SATIDS system.Rectified linear unit (Relu layer) activation function performs a threshold operation to an input element concerning zero as some input data contain negative values.Relu layer passes the positive input and gives zero value to the negative values.Then, a number of LSTM layers are used; each layer is followed by a 0.2 probability dropout layer.The dropout layer probability value is chosen to be 0.2 using a trial and error method to give the highest overall performance metrics as presented in Figure 6.The dropout layer sets the input to avoid overfitting, which decreases the network performance parameters.The initial learning rate was set to 0.0001, and LSTM batch size of 2000 to provide the highest performance parameters.A fully connected layer flattens the output of the previous layers and converts it to a single vector.Finally, the Softmax layer eventually enters the classification layer classifying the data as benign or any form of attacks.

Training and Testing of SATIDS
Figure 7 and Algorithm 1 summarize different phases of the proposed MRMR-SATIDS.To accurately detect attack type, the following steps are followed: 1. Preparation of the dataset: For continuous features, remove NaN and infinity values from the data using the mean of the appropriate column.In addition, the IP address and time stake label are removed.

The Proposed MRMR-SATIDS System
The main objective of any IDS system is to improve the accuracy of attack detection and discovering the attack type.This section provides a detailed description of the proposed SATIDS system which depends on both the MRMR FS technique and a LSTM network for the classification purpose.Then, SATIDS is applied to a WSN after its training and testing to study its performance when dealing with large number of mobile sensor nodes and its effect on the energy consumption.
Figure 5 presents the inner structure of the proposed SATIDS system.Rectified linear unit (Relu layer) activation function performs a threshold operation to an input element concerning zero as some input data contain negative values.Relu layer passes the positive input and gives zero value to the negative values.Then, a number of LSTM layers are used; each layer is followed by a 0.2 probability dropout layer.The dropout layer probability value is chosen to be 0.2 using a trial and error method to give the highest overall performance metrics as presented in Figure 6.The dropout layer sets the input to avoid overfitting, which decreases the network performance parameters.The initial learning rate was set to 0.0001, and LSTM batch size of 2000 to provide the highest performance parameters.A fully connected layer flattens the output of the previous layers and converts it to a single vector.Finally, the Softmax layer eventually enters the classification layer classifying the data as benign or any form of attacks.

Training and Testing of SATIDS
Figure 7 and Algorithm 1 summarize different phases of the proposed MRMR-SATIDS.To accurately detect attack type, the following steps are followed: 1. Preparation of the dataset: For continuous features, remove NaN and infinity values from the data using the mean of the appropriate column.In addition, the IP address and time stake label are removed.The dropout layer sets the input to avoid overfitting, which decreases the network performance parameters.The initial learning rate was set to 0.0001, and LSTM batch size of 2000 to provide the highest performance parameters.A fully connected layer flattens the output of the previous layers and converts it to a single vector.Finally, the Softmax layer eventually enters the classification layer classifying the data as benign or any form of attacks.To accurately detect attack type, the following steps are followed:

Training and Testing of SATIDS
1 Preparation of the dataset: For continuous features, remove NaN and infinity values from the data using the mean of the appropriate column.In addition, the IP address and time stake label are removed.
line of data enters the network separately. 5.After training, the network is ready to identify the category of data (normal or malicious).6.When the tested part enters the trained model, it is categorized as normal traffic or a type of attack.7. The network performance is analyzed by comparing the system output to the testing result.The two datasets contain raw data.If this raw data is input into our proposed system, slow convergence will occur.Normalization strongly influences the performance of a neural network.With unnormalized data, numerical ranges of features may vary strongly.The data (x) can be normalized by subtracting the mean (µ) of each feature and dividing by the standard deviation (σ).This way, each feature has a mean of 0 and a standard deviation of 1.This results in faster convergence.
x := x − µ σ 2 Feature selection process: Extract the important features of the dataset using the MRMR-FS and then modify the dataset.3 The dataset is divided into 70% for training and 30% for testing.4 Training process: SATIDS network based on LSTM network is trained using the modified training part of the dataset.The data is separated from the type of data.Each line of data enters the network separately.5 After training, the network is ready to identify the category of data (normal or malicious).6 When the tested part enters the trained model, it is categorized as normal traffic or a type of attack.7 The network performance is analyzed by comparing the system output to the testing result.

Applying SATIDS to an IoT system
After testing, the SATIDS is applied to a large-scale IoT with mobile nodes.To simulate this environment, 250 sensor nodes are distributed randomly at a 300 m × 300 m region divided into eight clusters (seventh level hierarchy) as shown in Figure 8. BS is located at (0,0).The initial energy of each node is set to 0.2 J and its threshold energy is 50 mJ.Each node uses 50 nJ/bit for the transmission and reception process.Energy dissipation is 100 pJ/bit and the data packet size is set to 4000 bits.Attacked nodes are randomly distributed between the clusters.Algorithm 2 presents SATIDS's procedures to secure an IoT system.

Applying SATIDS to an IoT system
After testing, the SATIDS is applied to a large-scale IoT with mobile nodes.To simulate this environment, 250 sensor nodes are distributed randomly at a 300 m × 300 m region divided into eight clusters (seventh level hierarchy) as shown in Figure 8. BS is located at (0,0).The initial energy of each node is set to 0.2 J and its threshold energy is 50 mJ.Each node uses 50 nJ/bit for the transmission and reception process.Energy dissipation is 100 pJ/bit and the data packet size is set to 4000 bits.Attacked nodes are randomly distributed between the clusters.Algorithm 2 presents SATIDS's procedures to secure an IoT system.
AFSA with a modified fitness function is used to optimize a clustering approach based on EHRT.As compared to selecting a single cluster behavior, the fitness function has been adjusted so that optimal CHs for each cluster are selected.Cluster formation and the optimal CHs election are the two phases of the EHRT technique (described in Section 3.5).Algorithm 2 summarizes the application of the proposed MRMR-SATIDS to an IoT network.AFSA with a modified fitness function is used to optimize a clustering approach based on EHRT.As compared to selecting a single cluster behavior, the fitness function has been adjusted so that optimal CHs for each cluster are selected.Cluster formation and the optimal CHs election are the two phases of the EHRT technique (described in Section 3.5).Algorithm 2 summarizes the application of the proposed MRMR-SATIDS to an IoT network.
Algorithm 2: Applying MRMR-SATIDS to a WSN network OUTPUT Detecting the source of the anomaly traffic and determine the attack type INPUT Randomly distributed IoT nodes CHs are selected according to EHRT algorithm Each CH aggregates traffic from its neighboring nodes CHs uses the proposed SATIDS to detect any anomaly traffic from neighboring nodes Determine the ID of this node and send its ID to the BS BS saves the ID of attacked nodes in a blacklist and shares it with other nodes FOR the received traffic from CHs BS uses the proposed SATIDS to scan received traffic from all CHs IF any attacked CH is detected BS declares its ID and the election process is repeated in that cluster to elect another CH END IF END FOR

Performance Metrics
Using MATLAB on an Intel Core-i7 processor with 8 GB RAM on the Windows 10 platform, the proposed SATIDS system was simulated, trained, and tested.The performance of the proposed system was analyzed using the confusion matrix, which is a table structure commonly used for multi-class assessment.It consists of four main values determined for each class [17]:

•
True Positive (TP): The number of normal observations in dataset traffic classified correctly as normal by the technique.

•
True Negative (TN): The number of intrusion observations in the IoT network properly classified by the algorithm as intrusions.

•
False Positive (FP): The number of normal IoT traffic observations wrongly identified by the algorithm as abnormal.

•
False Negative (FN): The number of abnormal observations wrongly identified by the algorithm in IoT network traffic as normal observations.
To prove the efficiency of the proposed system, the number of LSTM layers and hidden layers in each LSTM layer were varied and the following performance metrics of detecting different types of attack were recorded using the whole features and compared with using MRMR-FS technique.

•
Accuracy: The ratio of correctly calculated observations by the network to the total number of samples present in the determined dataset.
• Precision: The number of correctly detected observations, divided by the total number of observations that the model identifies as an attack.

ToN-IoT Dataset
Figure 9 presents the performance of the proposed SATIDS system with and without using MRMR-FS when detecting normal/anomaly traffic.As seen in Figure 9, SATIDS has its highest performance when using MRMR-FS, two LSTM layers and 500 hidden layers: 97.5% accuracy and 98.05% F1-score.In the case of detecting a backdoor attack, as shown in Figure 10, the system has almost the highest performance when using MRMR-FS, two LSTM layers and 500 hidden layers: 97% precision and 97.28% F1-score.The detection rate values are almost equal in all cases.
DDOS, injection, and scanning attacks, is shown in Figures 13-15, respectively.As seen in Figures 13-15, the network also has its highest performance when using three LSTM layers and 500 hidden layers without using the FS technique.
The proposed SATIDS network achieves its highest performance when detecting password attack when using two LSTM layers and 500 hidden layers without using the FS technique as shown in Figure 16.SATIDS has its highest performance in detecting XSS attack when using three LSTM layers 400 hidden layers as shown in Figure 17.DDOS, injection, and scanning attacks, is shown in Figures 13-15, respectively.As seen in Figures 13-15, the network also has its highest performance when using three LSTM layers and 500 hidden layers without using the FS technique.
The proposed SATIDS network achieves its highest performance when detecting password attack when using two LSTM layers and 500 hidden layers without using the FS technique as shown in Figure 16.SATIDS has its highest performance in detecting XSS attack when using three LSTM layers 400 hidden layers as shown in Figure 17.Figures 11 and 12 represent the performance of the proposed SATIDS system in detecting DOS and ransomware attacks, respectively.As shown, the network has its highest performance when using three LSTM layers and 500 hidden layers when using the MRMR-FS technique.The performance of the proposed SATIDS system in detecting DDOS, injection, and scanning attacks, is shown in Figures 13-15, respectively.As seen in Figures 13-15, the network also has its highest performance when using three LSTM layers and 500 hidden layers without using the FS technique.The proposed SATIDS network achieves its highest performance when detecting password attack when using two LSTM layers and 500 hidden layers without using the FS technique as shown in Figure 16.SATIDS has its highest performance in detecting XSS attack when using three LSTM layers 400 hidden layers as shown in Figure 17.

InSDN Dataset
Figures 18-23 show the performance metrics of SATIDS system when using the InSDN dataset for classifying the type of received traffic.The system has its highest performance when using two LSTM layers and 500 hidden layers with MRMR-FS for detecting normal/anomaly traffic, DDOS attack, and probe attack as in Figures 18-20, respectively.

InSDN Dataset
Figures 18-23 show the performance metrics of SATIDS system when using the InSDN dataset for classifying the type of received traffic.The system has its highest performance when using two LSTM layers and 500 hidden layers with MRMR-FS for detecting normal/anomaly traffic, DDOS attack, and probe attack as in Figures 18-20, respectively.
For detecting DOS attacks, the system has its highest performance when using four LSTM layers and 500 hidden without using MRMR-FS technique as shown in Figure 21.However, as shown in Figures 22 and 23, the highest performance of the system occurs when using three LSTM layers and 500 hidden with using MRMR-FS technique for detecting BFA and BOTNET attacks, respectively.

InSDN Dataset
Figures 18-23 show the performance metrics of SATIDS system when using the InSDN dataset for classifying the type of received traffic.The system has its highest performance when using two LSTM layers and 500 hidden layers with MRMR-FS for detecting normal/anomaly traffic, DDOS attack, and probe attack as in Figures 18-20, respectively.
For detecting DOS attacks, the system has its highest performance when using four LSTM layers and 500 hidden without using MRMR-FS technique as shown in Figure 21.However, as shown in Figures 22 and 23, the highest performance of the system occurs when using three LSTM layers and 500 hidden with using MRMR-FS technique for detecting BFA and BOTNET attacks, respectively.

Performance Comparison
Table 4 shows a performance comparison between the proposed MRMR-SATIDS system and other approaches using the ToN-IoT dataset.GRAD [28] outperforms MRMR-SARIDS, RF and NB in detecting normal, backdoor, DOS, and XSS attacks with a range of 0.2% to 2.8% difference in DR.However, SATIDS outperforms the technique in [28], RF and NB with the same percentage difference in DR in case of DOS, Injection, and Ransomware.MRMR-SARIDS also outperforms RF and NB in detecting DOS and XSS attacks, with a DR difference ranging from 5% to 70%.

Performance Comparison
Table 4 shows a performance comparison between the proposed MRMR-SATIDS system and other approaches using the ToN-IoT dataset.GRAD [28] outperforms MRMR-SARIDS, RF and NB in detecting normal, backdoor, DOS, and XSS attacks with a range of 0.2% to 2.8% difference in DR.However, SATIDS outperforms the technique in [28], RF and NB with the same percentage difference in DR in case of DOS, Injection, and Ransomware.MRMR-SARIDS also outperforms RF and NB in detecting DOS and XSS attacks, with a DR difference ranging from 5% to 70%.For detecting DOS attacks, the system has its highest performance when using four LSTM layers and 500 hidden without using MRMR-FS technique as shown in Figure 21.However, as shown in Figures 22 and 23, the highest performance of the system occurs when using three LSTM layers and 500 hidden with using MRMR-FS technique for detecting BFA and BOTNET attacks, respectively.

Performance Comparison
Table 4 shows a performance comparison between the proposed MRMR-SATIDS system and other approaches using the ToN-IoT dataset.GRAD [28] outperforms MRMR-SARIDS, RF and NB in detecting normal, backdoor, DOS, and XSS attacks with a range of 0.2% to 2.8% difference in DR.However, SATIDS outperforms the technique in [28], RF and NB with the same percentage difference in DR in case of DOS, Injection, and Ransomware.MRMR-SARIDS also outperforms RF and NB in detecting DOS and XSS attacks, with a DR difference ranging from 5% to 70%.The MRMR-SATIDS algorithm is implemented in an IoT that contains 250 (225 normal nodes + 25 attacked nods) randomly distributed nodes.The region is divided into eight clusters (seventh level hierarchy).To conserve power, the routing protocol is used.By applying both the MRMR-SATIDS trained algorithm and the EHRT routing protocol, the IoT network saves the device's power and data during the transmission process.Figure 24 depicts the alert messages that the CHs generate when an attacker node is detected.These messages contain the node ID and the cluster number.BS saves the node IDs in a blacklist and declares it to other nodes.

Implementing the Proposed SATIDS System in an IoT System
The MRMR-SATIDS algorithm is implemented in an IoT that contains 250 (225 normal nodes + 25 attacked nods) randomly distributed nodes.The region is divided into eight clusters (seventh level hierarchy).To conserve power, the routing protocol is used.By applying both the MRMR-SATIDS trained algorithm and the EHRT routing protocol, IoT network saves the device's power and data during the transmission process.Figure 24 depicts the alert messages that the CHs generate when an attacker node is detected.These messages contain the node ID and the cluster number.BS saves the node IDs in a blacklist and declares it to other nodes.Figure 25 represents the network lifetime after 400 rounds.The red curve represents the network lifetime without applying the MRMR-SATIDS security technique, while the green curve represents the network lifetime when applying the MRMR-SATIDS security technique.As shown in Figure 25, the difference between the two curves as the security algorithm prevents attack nodes from being CHs.Despite of the decrease of network lifetime by 5.33% when using our proposed security system, the system security increased.Figure 25 represents the network lifetime after 400 rounds.The red curve represents the network lifetime without applying the MRMR-SATIDS security technique, while the green curve represents the network lifetime when applying the MRMR-SATIDS security technique.As shown in Figure 25, the difference between the two curves as the security algorithm prevents attack nodes from being CHs.Despite of the decrease of network lifetime by 5.33% when using our proposed security system, the system security increased.
Figure 25 represents the network lifetime after 400 rounds.The red curve represents the network lifetime without applying the MRMR-SATIDS security technique, while the green curve represents the network lifetime when applying the MRMR-SATIDS security technique.As shown in Figure 25, the difference between the two curves as the security algorithm prevents attack nodes from being CHs.Despite of the decrease of network lifetime by 5.33% when using our proposed security system, the system security increased.The residual energy after 400 rounds is represented in Figure 26.The red points present the normal nodes while the green points present the attacker nodes.As noticed from the figure, after 400 rounds, only 30% of the normal nodes reach the threshold energy.Those nodes are the ones that were elected many times to be the CH.Only 8% of the attacked nodes reached the threshold since the BS did not select any of the attacked nodes as a CH after putting them in a blacklist.The proposed system was able to detect 24 out of 25 attacked nodes.The residual energy after 400 rounds is represented in Figure 26.The red points present the normal nodes while the green points present the attacker nodes.As noticed from the figure, after 400 rounds, only 30% of the normal nodes reach the threshold energy.Those nodes are the ones that were elected many times to be the CH.Only 8% of the attacked nodes reached the threshold since the BS did not select any of the attacked nodes as a CH after putting them in a blacklist.The proposed system was able to detect 24 out of 25 attacked nodes.

Robustness and Generalization Test
In machine learning and deep learning, robustness and generalization are critical.To demonstrate the robustness and generalizability of the proposed method, its performance is evaluated on two different network intrusion detection datasets such as ToN-IoT and InSDN datasets for network attack detection and network attack classification.Figures 9-23 exhibit detailed performance results for network attack detection and classification.On both datasets, the proposed model detected the network attacks with 99% accuracy.As a result, the proposed model should be regarded as robust and generalizable.The proposed approach, however, may not be resilient or generalizable to an adversarial updated dataset or adversarial environment, therefore the case study is not shown in this paper.

Conclusions and Future Work
IoT is an extremely powerful platform for connecting objects to the Internet.However, IoT applications are susceptible to many security vulnerabilities that need to be mitigated.Users' safety and privacy concerns are on the rise due to IoT system's massive attack vector.Improved security measures can be achieved with the use of ML.It is a

Robustness and Generalization Test
In machine learning and deep learning, robustness and generalization are critical.To demonstrate the robustness and generalizability of the proposed method, its performance is evaluated on two different network intrusion detection datasets such as ToN-IoT and InSDN datasets for network attack detection and network attack classification.Figures 9-23 exhibit detailed performance results for network attack detection and classification.On both datasets, the proposed model detected the network attacks with 99% accuracy.As a result, the proposed model should be regarded as robust and generalizable.The proposed approach, however, may not be resilient or generalizable to an adversarial updated dataset or adversarial environment, therefore the case study is not shown in this paper.

Conclusions and Future Work
IoT is an extremely powerful platform for connecting objects to the Internet.However, IoT applications are susceptible to many security vulnerabilities that need to be mitigated.Users' safety and privacy concerns are on the rise due to IoT system's massive attack vector.Improved security measures can be achieved with the use of ML.It is a never-ending challenge to find the best ML-based IDS techniques.
This research offers a two-level deep learning methodology called SATIDS based on the LSTM network to improve the performance of IDS systems.MRMR feature selection technique is utilized to minimize the huge number of features in the dataset.Using the MRMR technique improved the proposed IDS performance and its ability to detect various types of attacks.
Two of the latest realistic IoT datasets, namely ToN-IoT and InSDN, are used to evaluate the efficiency of the proposed system.These datasets contain various types of attacks that could cause IoT failure.Using MRMR offers a much better overall system performance.Using 500 hidden and two LSTM layers achieves accuracy of 97.5%, precision of 98.4%, detection rate of 97.9%, and F1-score of 98.05% on ToN-IoT dataset, and precision of 99%, DR of 99.6%, and F1-score of 99.3% on the InSDN dataset.In every experiment across multiple network datasets, the suggested model outperformed the state-of-the-art approaches.The proposed MRMR-SATIDS system succeeded in detecting a wide range of attacks.This demonstrates the suggested model's strength and flexibility for use with other network traffic datasets.
Finally, the trained algorithm is implemented in a power-saving IoT system that contains 250 mobile nodes to get close to the real network.This network used the EHRT routing protocol to save sensor energy and extend the network's lifetime.
The following changes will be made in future work and are anticipated to increase the proposed system's decision accuracy: -Analyze the performance of the proposed system against FGSM and JSMA attacks [29] to determine how robust the system is in an adversarial environment.We will continue to concentrate on the defense strategies that resist such attacks in the future.-Focus on developing methods for enhancing the SATIDS techniques to achieve better results for low-record attacks and detecting zero-day attacks on IoT systems.-Study the performance of the proposed SATIDS on a large, mobile, and heterogeneous real IoT network.

Figure 2 .
Figure 2. The relationship between an artificial fish and its surroundings "Reprinted permission from Ref. [24]"

•
) : the current i-th position of AF   →: the random i-th position of AF v: visual length u(0, 1): random number  →: vector of uniformly distributed in (−1, 1) Swarming behavior: To avoid danger and crowds, fish gather together nature.It is a common feature in AFSA that fish swarms behave this wa to the next equation

Figure 2 .
Figure2.The relationship between an artificial fish and its surroundings "Reprinted/adapted with permission from Ref.[24]".
(t) : the current i-th position of AF → x v : the random i-th position of AF v: visual length u(0, 1): random number → u : vector of uniformly distributed in (−1, 1)

Figure 7 and
Figure 7 and Algorithm 1 summarize different phases of the proposed MRMR-SATIDS.To accurately detect attack type, the following steps are followed:

Algorithm 1 :
Training and Testing the Proposed MRMR-SATIDS OUTPUT traffic type (normal/anomaly) and determine the attack type INPUT ToN-IoT/InSDN datasets Normalize dataset Remove IP address and time stack from the dataset Select the more effective features and eliminate the less effective ones using MRMR-FS Divide the dataset into 70% for training and 30% for testing Train the proposed IDS using the training dataset which contains traffic data and label For each data line train network with each traffic record and consider the traffic label END FOR Insert the testing dataset to the trained IDS Classify traffic (normal or a type of attack) Calculate IDS performance

•
Detection rate (DR): The ratio of the correct observations observed by the model to the total number of tests.DR = TP TP + FN (14) • F1-score: Evaluating the test accuracy by calculating the average weight of the detection rate.F1 − score = 2 * Precision * Recall Precision + Recall (15)

Figure 9 .Figure 9 .
Figure 9. Performance comparison of the proposed system using ToN-IoT dataset.

Figure 9 .Figure 10 .
Figure 9. Performance comparison of the proposed system using ToN-IoT dataset.

Figure 18 .
Figure 18.Performance comparison of the proposed system using the InSDN dataset.

Figure 18 .
Figure 18.Performance comparison of the proposed system using the InSDN dataset.

Table 4 .
Performance comparison of DR % using ToN-IoT database.

Table 4 .
Performance comparison of DR % using ToN-IoT database.

Table 4 .
Performance comparison of DR % using ToN-IoT database.