3.3. Processes Involved in the SCAVO-FS Technique
In this work, the SCAVO-FS system is derived to elect the optimal feature subsets. The AVO algorithm is stimulated by the navigational and foraging behaviors of the African vultures [
19]. The biological nature of these vultures with regard to competing and searching for food is outlined in four different phases as follows. Consider
number of vultures in the atmosphere, which describes the amount of population, i.e.,
. In the following equation, the fitness function of every location is evaluated.
Here,
represents the probability of choosing either the first or second group,
denotes the fitness function of the
location [
19]. Next, the formation of the 1st and 2nd groups during all the iterations is attained as follows.
Here, the ranges of
and
are
and
, respectively. The satiated vulture with sufficient energy can move a long distance to find food, whereas a hungry one cannot fly longer as denoted below.
In Equations (5) and (6), denotes the vulture with high energy, and denote the present and the maximal iterations, correspondingly, and indicate the random number in the range of , and , correspondingly; and describes the probability of entering the exploration phase.
The procedure of seeking food by African vultures defines the exploration stage in which the parameter
defines the selection of strategy.
In this expression,
represents the location vector of the vulture in the second iteration.
, where
denotes the randomly generated value in the range of
and denote the upper and lower boundaries correspondingly, while and indicate the randomly generated integers that lie in the range of 0 to 1.
The exploitation phase includes two stages with dissimilar strategies. The selection of any method relies on both
and
parameters, while its values lie in the range of 0 to 1. If
ranges between
and
, then the exploitation phase enters the initial phase, which defines the siege fight and rotating flight strategies.
Here,
denotes a randomly generated value in the range of [0, 1]. The solution to Equation (11) is given below.
Next, the rotational flight of the vulture can be modeled as given below.
Here,
and
denote the two randomly-generated integers in the range of [0, 1]. If
then the exploitation phase enters the second phase, which describes the aggressive siege and accumulation fight strategies for finding the food. Based on the following condition, any strategy can be selected.
where
(it) and
(
) represent the better vultures of the first and second groups, correspondingly as shown below.
Here,
signifies the dimension of the problem.
Levy(
) is evaluated as given below.
The best possible solution is not known at the initial stage of the AVO algorithm. Therefore, it is better to use a large step in the beginning, which might generate the calculation process farther from the optimum location [
20]. Consequently, the scaling variable is used for changing the situation in the initial phase. In the SCAVO algorithm, the vulture is repositioned at the end of every iteration based on sine- and cosine-adapted scaling factors, as given below [
20].
Now, the
SCaSF denotes the scaling factor as follows
In Equation (24), RNDI indicates a randomly-generated value; and denote the weighting factors, and indicate the current and maximal iteration, respectively. The scaling factor is used to control the vulture’s development at the initial stage. Different upsides of are tried and appointed for a proper choice of and . It is to be noted that the best possible results are attained when and are selected as and correspondingly. The scaling factor has changed the vulture’s position at the underlying period of the pursuit interaction, thus increasing the hunting capability of the AVO technique further. The calculation should provide the option to locate the best location between two locations for the exploitation of the search range. These cycles ensure the best exploitation and exploration capabilities of the calculation.
The fitness function considers the number of features selected and the accuracy of the classifier. It reduces the size of features selected and increases the classification accuracy. Consequently, the subsequent fitness function is used to evaluate the individual solution.
Here, , which is usually set to 0.9, is used to control the importance of subset length and classification quality. ErrorRate is the classification error rate based on the number of features selected. ErrorRate can be evaluated as the percentage of incorrect classifications to the whole number of classifications made, and its values lie in the range of 0 to 1. ErrorRate is the complement of classification accuracy. denotes the total features selected, and represents the overall number of features in the original data.
3.4. Classification Model
For accurate classification of the intrusions, the LSTM-AE model is leveraged for both the identification and the classification of the intrusions. The deep RNN (particularly LSTM) model is the underlying structure of the DL model that is applied to time-series and sequential data to learn the features and patterns. But, the LSTM [
21], out of the RNN method, contains memory cells for pattern recognition that is dependent on short -and long-term input datasets. These are beneficial in the detection and forecasting of the outliers in time-series datasets [
22]. The LSTM cell comprises three memory gates such as the forget, input (update), and output gates.
In Equation (26),
and
denote the bias and weight of the LSTM, respectively. Furthermore, the dataset fed as input to the LSTM cells gets upgraded by the input gate
.
At last, the output gate plays its role by transmitting the dataset created to the output cell
or the succeeding state vector
.
In this expression, stands for input dataset, denotes the prior short-term state, and and represent the bias and weight matrices, correspondingly. Meanwhile, the LSTM model has a state vector that includes and functions. The model evaluates the gradient error at every time point, whereas the derivative items of and functions become additive. It prevents the model from suffering gradient disappearing problems. Unlike other gradient descent models, the LSTM exploits a mini-batch method at the time of training the data samples. Usually, the LSTM model comprises a single layer of cells, whereas the ensemble and a combination of numerous LSTM models increase the depth of the layer. This in turn increases the model’s performance and accuracy in training. It also helps in identifying the short- and long-term continuous patterns and big datasets.
On top of that, the AE method is utilized with LSTM for training the long-term patterns and the most important features. AE is an unsupervised type of ANN that intends to learn the essential hidden representation of the datasets by decoding and encoding processes. The output and the input datasets are compared to evaluate the differences. In the case of large differences, it shows that the reconstruction loss is higher. Based on this output, it can be assumed that the model can handle the reconstructed dataset. Accordingly, the data are recognized as irregular.
The LSTM-AE is an application of AE in which the LSTM cells are employed in the encoder–decoder layer. This setup brings the advantages of both methods for time-series or sequential datasets. In this work, the LSTM-AE is employed because it provides certain advantages over the normal (regular) AEs, for instance, LSTM-AE can handle sequence data as input (time-series dataset), whereas the normal AE cannot accept a sequential sample as the input dataset. Furthermore, the LSTM-AE models take a wider range of input lengths (short- or long-term), while on the other hand, the regular AE takes only a fixed size of the input dataset. At last, the data dimension increases, and the computation becomes complicated, since the long- and short-term dependence on time in previous data affects the current IIoT data. To resolve these problems, LSTM-AE is applied for the extraction of the fundamental feature with benefits over normal AEs. The structure of the AE is shown in
Figure 2.
The AE model comprises the output layer, input layer, and Hidden Layer (HL), whereas the interconnected layer is made up of an LSTM cell to create the output layer dataset. The HL takes the sample from various sampling times to estimate and calculate the impact on the succeeding sample datasets at another sampling period. The next time point values are attained as the output values by modeling and integrating the impact as the forecasted values. The respective sample of the HL is characterized by an equation in which the sample of
remains the input instance of the data sample from
signifies the weight matrices, and
characterizes the bias vector between the input layer and the HLs. The function
updates the next input layer to make
as the output value in the AE architecture.
3.5. Hyperparameter Tuning Model
At last, the RMSProp optimizer is exploited for the hyperparameter tuning of the LSTM-AE model. In the vertical direction, the RMSProp optimizer [
22] restricts the oscillation. Thus, in the horizontal direction, the learning rate gets improved and the algorithm takes a large step in converging at a faster rate. The RMSProp calculation is given as follows. The value of the momentum is represented as beta and is set to 0.9 [
22].
In backward propagation process,
and
db are used to update the
and
parameters with the help of the following expression [
22]:
Let us assume the exponentially-weighted average square values of
and
instead of independently using the
and
db values for all the epochs [
22].
Here,
represents the hyperparameter in the range of 0 to 1. The newly weighed average can be formed with the help of current value square, weights and the average of the previous values. The parameters will be updated after the evaluation of the exponentially-weighted averages [
22].
is relatively lower in such a way that it is divided by . Here, is relatively higher so that when it is divided by with a comparatively large number, it slows down the update on the vertical dimension.