3.2. Data Preprocessing
Generally, we can perform intrusion detection through data analysis. The data is classified into two categories: continuous data and discrete data. Here, different data categories should be performed with different operations.
For continuous data, to reduce the impact of the different data dimensions on experimental results, the data is usually normalized before data reduction and classification. Normalization is the process of transforming a dimensionless expression into a dimensionless expression. When using principal component analysis (PCA) or the LDA algorithm for data dimensionality reduction, a covariance calculation is often involved. Among them, the Z-score normalization method can eliminate the effects of dimensional variance and covariance. It performs better than other normalization methods, so we use the Z-score normalization method. Its formula is
where
x is the original data input amount;
z is the normalized data output; and
and
represent the mean and variance of each dimension of the original dataset, respectively.
For discrete data, we use One-Hot coding for processing, which can expand the data features. It not only improves the nonlinear capability of the algorithm model, but also does not require normalization of multiple parameters.
3.3. The Proposed Algorithm
The basic idea of ILECA is to use the similarity measure function of high-dimensional data space as the weight to improve the between-class scatter matrix. At the same time, ILECA combines with LDA to maximize the between-class distance and minimize the within-class distance, so as to obtain the optimal transformation matrix and reduce the dimensions of the original data. Finally, ILECA combines with the ELM classification algorithm to classify the data and determine the security of the IoT devices.
Given a set D containing N train samples, . Suppose , is the j-th sample feature vector of the i-th class, and is the sample label corresponding to , where the sample feature is d dimension, then the total sample feature matrix can be expressed as ; the sample has a total of c types; and represents the number of i class of samples, i.e., .
The total sample mean vector
u and the class mean vector
of the
i-th sample are, respectively,
Moreover, the within-class scatter matrix, between-class scatter matrix, and transformation matrix objective functions are defined as follows.
Definition 1. The within-class scatter matrix is expressed as The within-class scatter matrix is the mean square error of the distance between each class of sample with its center, and represents the degree of dispersion of the same class of sample.
Definition 2. The between-class scatter matrix is expressed aswhere is a high-dimensional data spatial similarity measurement function, which represents the spatial similarity of data and ; and represent the mean values of data i and j in k dimensions, respectively; d is the feature dimension of data; and and represent the number of samples of class i and j, respectively. The between-class scatter matrix reflects the average of the distances between the centers of various classes with different spatial similarities and the center of the total sample. represents the dispersion between classes. The range of the high-dimensional data spatial similarity measurement function is (0, 1].
Definition 3. The objective function of the optimal transformation matrix is expressed aswhere A is the projection matrix and I is the identity matrix. According to the extreme value of generalized Rayleigh quotient, calculate the eigenvectors corresponding to the first m eigenvalues of , and combine them into a matrix to obtain the optimal transformation matrix , . Finally, the dimensionality-reduced sample feature vector is obtained through matrix calculation:where is the corresponding feature vector after the dimensionality reduction of the feature vector , and the dimensionality-reduced sample feature matrix is expressed as . After the dimensionality reduction, N samples are obtained and transformed into sample sets with new features , where is the m-dimensional feature vector of the dimensionality-reduced data, is the sample label, and the samples have c classes.
As shown in
Figure 3, the new sample set obtained after dimensionality reduction is input into a single layer neural network. For the single hidden layer neural network with
L hidden layer nodes, it can be expressed as
where
is the input weight between the
i-th hidden layer node and the input layer node,
is the offset of the
i-th hidden layer node,
is the output weight between the
i-th hidden layer node and the output layer node,
is the activation function, and
is the inner product of
and
. The input weight
and offset
in the function are random numbers between (−1, 1) or (0, 1).
To minimize the output error and the label error of the corresponding sample data, an objective function is established as
which is
The above
N equations can be expressed by a matrix as
where
H is the output matrix of the hidden layer nodes,
is the output weight matrix, and
T is the expected output.
According to Equation (
13), as long as the input weight
and the offset
are randomly determined, the output matrix
H is uniquely determined. The Moore–Penrose generalized inverse matrix
of
H is used to analyze and determine the least-squares solution
of the smallest norm [
25,
26]
It can be seen from the Equation (
14), to obtain better generalization, the positive value
is added to the diagonal of
or
. Then, it can repair the matrix and ensure that it is a full rank matrix. Therefore, the classifier training process is given as follows.
As shown in
Figure 4, we reduce the train data to generate a transformation matrix
, and input the dimensionality-reduced train data into the ELM classifier to calculate the final weight
. Then, let input the dimensionality-reduced test data to the ELM classifier for classification, and finally output the prediction results of the test data.
Figure 5 shows the flow chart of ILECA. The specific process of ILECA is described as follows, and the ILECA pseudocode is shown in Algorithm 1.
Algorithm 1: ILECA |
Input: train set , test set Output:
expected classification matrix T- 1:
formulate the feature matrix X for D - 2:
- 3:
calculate - 4:
obtain by solving the eigenproblem of - 5:
calculate , obtain the new train data - 6:
generate and randomly, set the number of hidden neurons L - 7:
calculate the output of hidden neurons H according to the Equation ( 13) - 8:
calculate the output weight of classifier according to the Equation ( 14) - 9:
formulate the feature matrix for - 10:
- 11:
- 12:
calculate the output of hidden neurons for test data according to the Equation ( 13) - 13:
according to the Equation ( 12) - 14:
returnT
|
Step 1: Perform Z-score normalization on the train samples according to Equation (
1).
Step 2: Calculate the within-class scatter matrix
according to Equation (
4), and calculate the between-class scatter matrix
according to Equation (
5).
Step 3: Establish the objective function according to Equation (
7), calculate
, and decompose the characteristic problem to obtain the eigenvalues and eigenvectors. Take the eigenvectors corresponding to the first
m largest eigenvalues as the transformation matrix
,
.
Step 4: Calculate
according to Equation (
8), and obtain the new train data
.
Step 5: Generate and randomly, and set the number of hidden neurons L.
Step 6: Calculate the output of hidden neurons
H according to the Equation (
13).
Step 7: Calculate the output weight of classifier
according to the Equation (
14).
Step 8: Calculate .
Step 9: Calculate the output of hidden neurons
for test data according the to Equation (
13).
Step 10: Calculate the output for test data by Equation (
12) with
and
.