A Hybrid Deep Learning Model with Self-Improved Optimization Algorithm for Detection of Security Attacks in IoT Environment

: With the growth of the Internet of Things (IoT), security attacks are also rising gradually. Numerous centralized mechanisms have been introduced in the recent past for the detection of attacks in IoT, in which an attack recognition scheme is employed at the network’s vital point, which gathers data from the network and categorizes it as “Attack” or “Normal”. Nevertheless, these schemes were unsuccessful in achieving noteworthy results due to the diverse necessities of IoT devices such as distribution, scalability, lower latency, and resource limits. The present paper proposes a hybrid model for the detection of attacks in an IoT environment that involves three stages. Initially, the higher-order statistical features (kurtosis, variance, moments), mutual information (MI), symmetric uncertainty, information gain ratio (IGR), and relief-based features are extracted. Then, detection takes place using Gated Recurrent Unit (GRU) and Bidirectional Long Short-Term Memory (Bi-LSTM) to recognize the existence of network attacks. For improving the classiﬁcation accuracy, the weights of Bi-LSTM are optimally tuned via a self-upgraded Cat and Mouse Optimizer (SU-CMO). The improvement of the employed scheme is established concerning a variety of metrics using two distinct datasets which comprise classiﬁcation accuracy, and index, f-measure and MCC. In terms of all performance measures, the proposed model outperforms both traditional and state-of-the-art techniques.


Introduction
IoT has impacted our everyday lives due to its supreme advancement and it is a rising technology that provides an exclusive connection for making available automated operations and services in diverse fields [1].IoT is perceived as a method with a suitable system interrelated via servers, sensors, and different software.Due to their lesser processing power, IoT appliances are deployed in several security areas.As additional devices are deployed on IoT, security and privacy issues are gaining progressive deliberation [2].The IoT attacks are leading to implausible losses to IoT networks and even threatening human security.Moreover, IoT networks contain limited characteristics that make attack exposure more difficult.Initially, the range of platforms, hardware, software, and protocols exposes diverse vulnerabilities.Subsequently, higher-rate and lower-rate attacks are widely used by IoT hackers to hack legitimate data [3].These lower-rate attacks are complicated to notice and reside on the network.Finally, attackers become more intelligent and can vigorously modify their attack strategies as per ecological feedback to avoid detection, making it more complex for defenders to find out consistent patterns to recognize attacks [4].
As the number of IoT devices increases, attackers have many opportunities to negotiate them through malicious email, secrecy attacks, and denial of service (DoS) attacks, amongst other types of attacks [5].Attacks can come from the channels that unite the IoT elements.The protocols deployed in IoT systems include security issues that affect the whole system.
IoT systems are also susceptible to renowned network attacks such as DoS and spoofing of software and appliances [6].
Certain reports declared that 70% of IoT devices are subjected to different network attacks that cause diverse vulnerabilities, such as encryption and password security.The domain wherein IoT is broadly used are smart homes, intellectual transport systems, agriculture, hospitals, earthquake discovery, and so on [7,8].For the malevolent agents, IoT appliance is vulnerable making the IoT devices a launcher pad for attacks on varied domains [9].Thereby, a protected IoT infrastructure is essential for protecting IoT devices from attacks.In this paper, a hybrid deep learning model that identifies security attacks in an IoT environment is implemented.Two deep learning model is used: GRU and Bi-LSTM, weights of Bi-LSTM are optimally tuned via a self-upgraded cat and mouse optimizer (SU-CMO) algorithm to refine classification accuracy.Furthermore, two separate datasets are utilized to compare the proposed model's classification accuracy to other existing schemes.
The present paper follows mainly three steps which are as under: • Suggests a new attack detection model in IoT, where various diverse features are derived.

•
Deploys hybrid classifiers such as GRU and BI-LSTM with an optimization strategy to detect attacks.

•
Exploits an SU-CMO model to choose the optimal weights in Bi-LSTM.
Section 2 reviews the work.Section 3 briefs a stepwise description of the proposed model and Section 4 explains about the extraction of diverse features.Section 5 depicts SU-CMO-based hybrid classification.Section 6 illustrates outcomes and the conclusion is presented in Section 7.

Related Work
In 2020, Mandal et al. [1] stated that ML supported various fields for service betterment.Moreover, for having interaction among humans in a single device, IoT acts a major role.Moreover, due to the emergence of digital technology, conveying data with no human communication is possible.In the aspect of privacy and security patterns, ML was deployed.
Here, the recognition of intrusion and the challenges of security in IoT were concentrated more.Moreover, a variety of attacks were analyzed depending on surface attacks.Here, the major aim was to develop the effectiveness of recognizing the attacks by executing the ML approach.
In 2021, Kan et al. [10] proposed an attack detection technique for an IoT system depending upon Adaptive APSO-CNN.The APSO-CNN optimized the 1-D CNN constraints.For the fitness of composing technique, the cross-entropy losses of CNN were considered.Additionally, an assessment technique was defined that measured both the prediction and predicted labels to evaluate the anticipated APSO-CNN algorithm.From the simulated outcomes, the efficiency and reliability of APSO-CNN were proven regarding attack detection.
In 2020, Nimbalkar et al. [11] explored various attacks in IoT due to the vulnerabilities in devices.They stated that recognition of attack was a dreary procedure for ML techniques due to the existence of traffic features in IoT systems.Their work offered a feature selection for intrusion detection systems for the revelation of DoS and DDoS attacks.Using the inclusion operation and union operation, the sunsets of features in the developed system were obtained.In the end, the enhancement of deployed scheme was proven.
In 2020, Pecor et al. [12] explained the contribution of IoT to the everyday lives of humans.They established the recognition of traffic in the network and the classification of the detected network.In this work, a large dataset was introduced for detecting the traffic in the network.With the aid of the DN model, they scrutinized binary and multinomial classification.
In 2020, Rahman et al. [13] developed a Scalable ML model for detecting intrusions in IoT-facilitated Smart Cities.Their work addressed the limitations of centralized IDS by proposing semi-distributed and distributed techniques.Additionally, feature extraction and feature selection were performed.For allocating the tasks, parallel ML techniques were developed.Their results obtained provided accuracy and building time performance on attack detection.
In 2020, Atul et al. [14] exposed an effective method for sharing and relocating information for digital communication.Certain system challenges were mentioned namely, failure in service, abnormality, and security barriers.This work analyzed and presented a communication pattern utilizing the EASH approach.The abnormality sources of the communiqué paradigm were differentiated by employing the ML technique.The performance, accuracy, and effectiveness were calculated for the developed method.
In 2021, Krishna and Thangavelu et al. [15] examined the DOS attack in IoT systems.The security issues and attacks, which occurred in IoT devices, were demonstrated.For detecting the attack, two algorithms such as hybrid ML-F were proposed.Here, the developed scheme has attained the utmost performance in classifying the attacks correspondingly.
In 2020, Gu et al. [16] examined the security and privacy issues in IoT with much concentration.They described that IoT attacks were causing implausible defeat to IoT networks and intimidating the security of humans.Here, a reinforcement learning-oriented attack detection scheme was proposed that detected the attack pattern and its conversion.In this work, the IoT traffic features were also explored and entropy-based metrics were used to forecast the attacks in IoT.Furthermore, widespread experiments were performed over the IoT dataset, and the efficiency of the developed model was revealed.
In 2022, Gopali et al. [17] by employing a recurrent neural network including LSTM, can identify anomalies in the Internet of Things environment.In addition, the authors evaluate and contrast deep learning strategies i.e., CNN, reporting on their effectiveness.In this study, it is observed that the LSTM model with the fastest learning rate had the highest accuracy, although it required a longer time to be trained.
In 2022, Ahmed et al. [18] offered a comprehensive analysis of numerous low-rate distributed denial of service (LDDoS) detection algorithms that are being used for SDN.The distributed denial of service (DDoS) assault has recently evolved into the LDDoS attack, which is more difficult to detect and creates more of a challenge.The authors demonstrate that techniques employing deep learning combined with a hybrid model, such as CNN-LSTM and CNN-GRU, may achieve the desired results.
In 2022, Abbas et al. [19] demonstrated that traditional approaches and technologies are ineffective in addressing new security concerns and difficulties, and how machine learning as a promising technology enables the creation of a wide variety of effective ways that can improve the safety and security of the IoT.

Review
Table 1 shows a review of the approaches used in the detection of attacks in IoT.In [1], an ML algorithm was used to find out the attacks that occurred in the IoT network.High accuracy and lower false rate are identified using the machine learning classification algorithm.However, some of the security issues are not considered in this approach.APSO-CNN was performed in [10], which performs effective and reliable detection of attack in IoT networks, but it does not differentiate the complicated task of interruption.JRip classifier was deployed in [11] and achieved a higher performance, accuracy, and detection rate, but the detection was applied for only a particular dataset.NN was exploited in [12], which offered higher accuracy and performance rate.However, layer specification was not performed using this analysis.In [13], the MLP algorithm was used to detect the attack, which provided a high level of performance and a higher feature set.Furthermore, it tests only detection rate but not efficiency.In [14] attack detection was performed using the EASH algorithm in which a higher rate of accuracy is found.ML-F was used in [15] which provided a higher detection of attacks in IoT networks.However, it does not categorize some of the attacks.The Markov Decision technique was used in [16] for the quick detection of attacks in IoT networks.This attained high accuracy for the feature set.However, ANN was not detected accurately.

A Stepwise Description of the Proposed Model
The developed attack detection model comprises three essential phases.
• Initially, features including "kurtosis, variance, moments, mutual information, symmetric uncertainty, information gain ratio, and relief-based features" are derived.

•
These features are then subjected to optimized GRU and BI-LSTM that recognizes the presence of attacks.

•
Here, the weights of BI-LSTM are optimally tuned via SU-CMO.

Extraction of Diverse Features
This work extracts the following features from the input data.A brief explanation of the features is as follows: Kurtosis [20]: "It is a measure that identifies whether the data are light-tailed or

Extraction of Diverse Features
This work extracts the following features from the input data.A brief explanation of the features is as follows: Kurtosis [20]: "It is a measure that identifies whether the data are light-tailed or heavytailed and related to the normal distribution".Datasets with smaller kurtosis offer minor outliers or tails.In addition, the datasets with high kurtosis provide heavy outliers or tails.The arithmetic formulation of kurtosis KS for univariate data such as Y 1 ,Y 2 , . . .Y k , is articulated in Equation (1).
The standard deviation is calculated by k value available in the denominator whilst computing the kurtosis.
Variance [21]: It is defined as the mean squared disparity amongst every data point and the center of distribution computed by mean.
Moment [22]: It is the instant probability distribution along with arbitrary variables in probability theory and statistics.It is the ordinary value of a specified integer power difference of the arbitrary variable from the mean.The moments with higher order are related to the shape and spread distribution of the location.The m th the moment linked to the central moment of a real-valued arbitrary variable Q is the quantity in which, eo stands for the expectation operator.The m th moment regarding the mean Q is portrayed for a continuous UPD with f (y) PDF.The moment is given in Equation (2).
MI Features: It is defined as the calculation of exchanged information among two ensembles of random variables N and Z It is formulated as revealed in Equation (3), in which, ρ signifies probability.
Symmetrical uncertainty: SU computes the features depending upon evaluated SU correlation metrics amongst the class and feature.The MI is calculated as in Equation ( 4), wherein, MI implies MI, f e stands for features, cl stands for class, and P implies probability function.Further, SU is formulated as in Equation (5), in which En refers to the entropy function.
MI( f e, cl) = ∑ P( f e, cl)log 2 P( f e, cl) P( f e).P(cl) SU( f e, cl) = 2(MI( f e, cl))/(En( f e).En(cl)). (5) IGR [23]: It is the ratio between information gain (IG) and split information (SI) value as shown in Equation (6), where K refers to a random variable, b refers to an attribute and in equation ( 7) U(ti) refers to the number of times ti occurs, U(t) refers to entire event counts and t refers to event sets.
Relief Features: It was modeled for application to binary classification issues with discrete or numerical features.It is modeled as in Equation (8), where E i points out the feature vector, the closer same class instance is termed "nearHit", and the closer different class instance is termed "nearMiss".
The derived features are signified by Fe, which are then subjected to hybrid classification.

Hybrid Classifiers
LSTM classifier: It [24] includes a sequence of recurring LSTM cells.Each cell of LSTM encompassed 3 units, such as "forget gate, the input gate, and the output gate".Presume variables Z and D that indicate hidden and cell state in order.(X t , D t−1 , Z t−1 ) and (Z t , D t ) indicate input and output layers.
At time t, the output, input and forget gate implies O t , I t , F t in that order.LSTM chiefly exploits F t for sorting the data.The sorted data indicate specified partial features connected to the previous gaze direction; F t is formulated as shown in Equation ( 9).
In Equation ( 9), (J ZF , L ZF ) and (J IF , L IF ) points out weight and bias constraints to map hidden and input layers to forget gate and activation function is signified by σ.
Input gate is exploited by LSTM as revealed in Equations ( 10)-( 12), wherein, (J ZG , L ZG ) and (J IG , L IG ) implies weight and bias constraints to map hidden and input layers to the cell gate respectively.(J ZI , L ZI ) and (J I I , L I I ) imply weight and bias constraints to map hidden and input layers to I t .
In addition, the LSTM cell obtains the output hidden layer from the output gate as shown in Equations ( 13) and ( 14), in which, (J ZO , L ZO ) and (J IO , L IO ) represents weight and bias to map the hidden and input layer to O t .Accordingly, the weights of LSTM represented by J are optimally elected by the proposed SU-CMO model.
Bi-GRU [25]: Depending upon RNN, 3 gates named "forget gate, input gate, and output gate and memory cell" are integrated with LSTM for controlling the flow of information across LSTM cells.Similarly, GRU deploys special gates, called reset and update gates, for lessening gradient dispersal with slighter computation losses.The update gate (ut) substitutes forget and input gates of LSTM, portraying the retention degree of prior data as revealed in Equation (15).
In Equation (15), µ points out the sigmoid activation function among 0 and 1, Fea t stands for the input matrix at time step t, R t−1 stands for the hidden state at the prior time step; t − 1 stands for the weight matrix of ut and f u stands for the bias matrix of ut.The reset gate (rt) regulates how much chronological data have to be ignored, which is revealed in Equation ( 16), wherein, W r characterize weight matrix of rt and f r symbolize bias matrix of rt.
Subsequently, the candidate's hidden state is revealed in Equation ( 17), wherein, tanh stands for tanh activation function.f R and W R stand for bias matrix and weight matrix of new cell state, * stands for dot multiplication function.Thus, the output R t implies linear interruption amid R t and R t−1 in Equation (18).
The forward GRU captures the previous details of input data and the backward GRU obtains the upcoming details of input data.The Bi-GRU is modeled as in Equation ( 19), wherein, ← R t and → R t correspond to the hidden state of backward and forward GRU in that order, Ct corresponds to combining technique of outputs at two directions.

SU-CMO Model
The extant CMBO [26] model gives optimal solutions; still, it tolerates low accuracy.For the aim of overcoming the drawbacks of conventional CMBO, specific improvements were made.The steps in the SU-CMO model and flowchart (Figure 2) are given below.
Step 1: The initial population of B search agents is initialized.
Step 2: The parameters of B, B c , B m , T are initialized.Here, B is the count of members in the population matrix A.
Step 3: The initial population is created as per Equation (20).
Here, y i,d is the d th problem variable.
Step 4: The fitness of the search agents is computed as per Equation (21).
Step 5: Using Equations ( 22) and ( 23), update the sorted population matrix A S .Here, the i th population of the sorted population matrix is denoted as y S i,d .In addition, Obj S is the sorted objective function-based vector.Step 6: Using Equation (24), the mice population is chosen.
Step 8: Here, M, B m , M i , C, B c , C j points to the mice population, count of mice, j th mice, cat population, count of cats and the i th cat, respectively.Step 9: The position update of cats is modeled as in Equation ( 26), where C new j new points to the new position of the j th cat and C j,d is the new value for d th problem.In addition, the random value r is estimated randomly within the limit [0, 1].Here, I is computed as in Equation ( 27), where rand is a random integer.
Here, I= round(1 + rand); Step 10: If j = B c If the above condition is satisfied then H i is created using Equation ( 28).
Step 11: Then, position update of mice takes place based on Equations ( 29) and (30).Conventionally, M i is updated as shown in Equation (30), however, as per the SU-CMO model, M i is updated based upon random integers ra 1 and ra 2 as in Equations ( 31) and (32).
Here, ra 1 and ra 2 are assigned values of 1.25 and 1.75.
Step 12: (a) In case the above condition is not satisfied, then increase j by 1, and again update C j .(b) Terminate the if condition.
Step 13: if the above condition is not satisfied, then increase i by 1. (c) End if.
Step 14: If t = T, then the best solution acquired so far is returned.
Step 15: If t = T, then increase i by 1 and move back to step 8.
Future Internet 2022, 14, x FOR PEER REVIEW 8 of 15 Step 5: Using Equations ( 22) and ( 23), update the sorted population matrix  .Here, the  population of the sorted population matrix is denoted as  , .In addition,  is the sorted objective function-based vector.

Simulation Setup
The presented HC + SU-CMO scheme for the detection of attacks was experimented with in "Python".The effectiveness of the newly proposed method was evaluated over HC + ALO [27], HC + AO [28], HC + BOA [29], HC + CMBO [26], HC + SSA [30], NN [31], RNN [32], Bi-GRU [25], SVM [33] and KNN [33] concerning popularly used metrics such as NPV, accuracy and FPR along with other widely used metrics.The convergence study was carried out by experimenting with several iterations, which ranged from 0 to 25 with an interval of 5.The datasets from [34,35] were used for the analysis; they are represented as datasets 1 and 2 in the description part.Dataset 1 has only a single category, which is DDoS, but dataset 2 has multiple attack categories such as backdoor, DDoS, exploit, fuzzes, reconnaissance, shellcode, and worms, however, these attack categories are under single classification output i.e., "Attack".

Performance Analysis
The performances of the suggested HC + SU-CMO are computed over extant classification models and optimization models regarding varied metrics.Moreover, evaluations were set up using datasets [34,35] respectively, and relevant outcomes are plotted in Figures 3-7.Analysis of suggested HC + SU-CMO model is analyzed and compared with HC + ALO, HC + AO, HC + BOA, HC + CMBO, HC + SSA, NN, RNN, Bi-GRU, SVM, and KNN for LPs which ranges from 60 to 80 with an interval of 10.Further, the assessment of the adopted scheme over the traditional approaches for accuracy and MCC is signified in Figures 3 and 5 along with an assessment of the adopted scheme over traditional approaches for F-measure and rand index is signified in Figures 4 and 6 for dataset 1 and dataset 2 respectively.In the case of this estimation, the HC + SU-CMO model has grown superior results when compared to other schemes.On looking through Figures 3a and 5a, the accuracy of the HC + SU-CMO model raise compared to other schemes for dataset 1 and dataset 2. Especially, better outputs are attained at the 60th and 70th LP for the proposed scheme regarding accuracy for both datasets.The superiority of the developed model is proven over HC + ALO, HC + AO, HC + BOA, HC + CMBO, HC + SSA, NN, RNN, Bi-GRU, SVM, and KNN models.
The proposed model shows a good accuracy level at the 60th and 70th learning rates; however, at the 80th learning rate, it shows less accuracy level compared to the lower learning rate.Bi-GRU performs poorly on both metrics, accuracy, and MCC.The proposed model shows a good accuracy level at the 60th and 70th learnin however, at the 80th learning rate, it shows less accuracy level compared to th learning rate.Bi-GRU performs poorly on both metrics, accuracy, and MCC.
F-measure is a classification evaluation measure defined as the harmonic m recall and precision.Figure 4a depicts the f-measure of the adopted model, prod positive outcome at learning rate 60th and 70th.
The Rand Index is another widely used measure.It is a measurement of the to which two data clusters are like one another.The adopted model's rand index taset 1 is illustrated in Figure 4b.The measure is nearly constant across all learni which is preferable to other existing schemes.
The metrics findings for dataset 2 are likewise satisfactory.Figure 5a,b demo the accuracy and MCC of the model used for dataset 2. Accuracy is almost consis all given learning rates, whilst MCC also outperforms other known techniques at a ing rates in the graph.SVM, on the other hand, exhibits worse accuracy than o proaches across all learning rates, as well as very poor outcomes in the MCC mea After the proposed model, KNN and NN also perform well for dataset 2 in bo rics, i.e., accuracy and MCC.
Figure 6a,b illustrates the F-measure and rand index for the adopted model as other known techniques for dataset 2. The adopted model also performs well F-measure is a classification evaluation measure defined as the harmonic mean of recall and precision.Figure 4a depicts the f-measure of the adopted model, producing a positive outcome at learning rate 60th and 70th.
The Rand Index is another widely used measure.It is a measurement of the degree to which two data clusters are like one another.The adopted model's rand index for dataset 1 is illustrated in Figure 4b.The measure is nearly constant across all learning rates which is preferable to other existing schemes.
The metrics findings for dataset 2 are likewise satisfactory.Figure 5a,b demonstrates the accuracy and MCC of the model used for dataset 2. Accuracy is almost consistent for all given learning rates, whilst MCC also outperforms other known techniques at all learning rates in the graph.SVM, on the other hand, exhibits worse accuracy than other approaches across all learning rates, as well as very poor outcomes in the MCC measure.
After the proposed model, KNN and NN also perform well for dataset 2 in both metrics, i.e., accuracy and MCC.
Figure 6a,b illustrates the F-measure and rand index for the adopted model, as well as other known techniques for dataset 2. The adopted model also performs well in these metrics over the whole learning rate.SVM performs very poorly at the 60th learning rate and improves at the 80th learning rate for the f-measure metric.
All the techniques show outcomes over 60%, the adopted model having results above 80%, outperforming the other known techniques in the paper.Furthermore, we can see that no techniques underperform in the rand index metric.

Convergence Study
The convergence study of the suggested SU-CMO scheme in contrast with traditional schemes such as ALO, AO, BOA, CMBO, and SSA for ranging iterations is illustrated in Figure 7.In this scenario, estimation is computing by varying iterations.While examining the outcomes, the suggested SU-CMO has increased the slightest cost values which is ranging from the 13th-the 25th iteration in the comparison with others.In the same manner, the recommended SU-CMO has grown slightly higher cost from the 0th-12th iterations; although, values are lower than those of those being compared such as ALO, AO, BOA, CMBO, and SSA.While, the existing CMBO has demonstrated the worse outcome i.e., highcost value in almost all iterations when contrasted with the several alternative schemes such as ALO, AO, BOA, CMBO, and SSA.Therefore, improved convergence is achieved by SU-CMO over alternate schemes.

Figure 1 15 Figure 1 .
Figure 1 depicts the overall framework of the proposed model below.Future Internet 2022, 14, x FOR PEER REVIEW 5 of 15

Figure 1 .
Figure 1.The overall framework of the adopted model.

FutureFigure 3 .
Figure 3. Performance of the adopted model over other existing schemes for dataset 1 (a) Accuracy (b) MCC.Figure 3. Performance of the adopted model over other existing schemes for dataset 1 (a) Accuracy (b) MCC.

Figure 3 .
Figure 3. Performance of the adopted model over other existing schemes for dataset 1 (a) Accuracy (b) MCC.Figure 3. Performance of the adopted model over other existing schemes for dataset 1 (a) Accuracy (b) MCC.

Figure 3 .
Figure 3. Performance of the adopted model over other existing schemes for dataset 1 (a) Accuracy (b) MCC.

Figure 4 .Figure 5 .
Figure 4. Performance of the adopted model over other existing schemes for dataset 1 (a) F-measure (b) Rand Index.

Figure 4 .Figure 3 .
Figure 4. Performance of the adopted model over other existing schemes for dataset 1 (a) F-measure (b) Rand Index.

Figure 4 .Figure 5 .Figure 6 .
Figure 4. Performance of the adopted model over other existing schemes for dataset 1 (a) F-measure (b) Rand Index.

Figure 6 .
Figure 6.Performance of the adopted model over other existing schemes for dataset 2 (a) F-measure (b) Rand Index.Figure 6. Performance of the adopted model over other existing schemes for dataset 2 (a) F-measure (b) Rand Index.

Figure 6 .
Figure 6.Performance of the adopted model over other existing schemes for dataset 2 (a) F-(b) Rand Index.

Table 1 .
Reviews on conventional IoT attack detection models.