Intrusion Detection of UAVs Based on the Deep Belief Network Optimized by PSO

With the rapid development of information technology, the problem of the network security of unmanned aerial vehicles (UAVs) has become increasingly prominent. In order to solve the intrusion detection problem of massive, high-dimensional, and nonlinear data, this paper proposes an intrusion detection method based on the deep belief network (DBN) optimized by particle swarm optimization (PSO). First, a classification model based on the DBN is constructed, and the PSO algorithm is then used to optimize the number of hidden layer nodes of the DBN, to obtain the optimal DBN structure. The simulations are conducted on a benchmark intrusion dataset, and the results show that the accuracy of the DBN-PSO algorithm reaches 92.44%, which is higher than those of the support vector machine (SVM), artificial neural network (ANN), deep neural network (DNN), and Adaboost. It can be seen from comparative experiments that the optimization effect of PSO is better than those of the genetic algorithm, simulated annealing algorithm, and Bayesian optimization algorithm. The method of PSO-DBN provides an effective solution to the problem of intrusion detection of UAV networks.


Introduction
In recent years, with the rapid development of cloud computing and artificial intelligence technology, the Internet of Things technology has also ushered in vigorous development. Various intelligent devices can receive a large amount of information through data exchange and interconnection. The popularity of the Internet of Things technology and the intelligence of devices have brought great convenience to people, but the use of new technologies and smart devices has also brought new security and privacy risks [1][2][3]. As the Internet of Things nodes collect and store large amounts of user privacy data, Internet of Things systems have become important targets for cyber attackers. In this case, protecting personal privacy and data security is very important [4][5][6].
With the progress of technology and the continuous reduction in manufacturing costs, the Internet of Things system composed of unmanned aerial vehicles (UAVs) has entered industrial production and people's daily life from the military field. Nowadays, UAVs have been widely used in film and television shooting, agricultural monitoring, meteorological monitoring, forest fire detection, emergency rescue, and other fields. However, while UAVs bring various conveniences to our production and life, the network security problems they face have been gradually exposed [7,8].
When multiple UAVs cooperate to perform tasks, it is necessary to build information connection channels between them to form a mobile self-organizing network of UAVs. The UAVs in the network realize the real-time sharing of information through this mobile network, which no longer needs to be forwarded by a ground station, and this effectively improves the survivability and combat ability of The detection model based on the DBN method is shown in Figure 1. The input layer includes five types of network data, including the Normal, Probing, DoS, U2R, and R2L data. A DBN is a neural network model, composed of multiple RBMs. When applying a DBN network in intrusion detection, the network structure should be trained first, to determine the connection weight and neuron bias of the network. The DBN mainly includes pre-training and reverse fine-tuning in the process of training the model. First, each layer of the RBM network is trained independently and unsupervised in the pre-training process to ensure that as much feature information as possible is retained when the feature vectors are mapped to different feature spaces. Then, the BP network is set-up in the last layer of the DBN, and the output eigenvector of the RBM is received as its input eigenvector. Then, supervised training is conducted for the entity relationship classifier. Moreover, each layer of the RBM network can only ensure that the weights in its own layer are optimal for the feature vector mapping of that layer, not for the whole DBN. Therefore, it is necessary for the BP network to spread error information, from top to bottom, in each layer of the RBM and fine-tune the DBN network. The process of training the model of the RBM network can be regarded as the initialization of the weights of a deep BP network, which makes the DBN overcome the shortcomings of the BP network, which easily falls into the local optimum due to the random initialization of weights.  A single RBM is a neural network model consisting of a visible layer and a hidden layer [45]. Figure 1 shows a network structure consisting of 3 layers of the RBM, where v is the visible layer connecting the intrusion detection data, h is the hidden layer, which is used to extract the effective features of the input data, and W is the connection weight of the visible layer and the hidden layer. The neurons of the same layer in the network structure are not connected to each other, and the neurons of the adjacent layers are connected to each other by weights. The inactivated and activated states are represented by a binary, 0 and 1, for neurons in the network.
The RBM is an energy-based model [46], where i v is used to represent the state of neuron i in the visible layer, with corresponding bias i a , j h is used to represent the state of neuron j in the hidden layer, with corresponding bias j b , and the connection weight of neuron i and j is ij w .
The energy of the RBM can be expressed as ( ) In the equation,  is the RBM parameter, and n and m are the number of neurons in the visible layer and hidden layer, respectively.
From the energy function, the joint probability distribution of ( ) , vh can be obtained as follows: Gibbs and other sampling methods are generally used to approximate it [47]. Hinton proposed a fast learning algorithm using contrast divergence (CD) to train the network parameters, which improves the training efficiency and promotes the development of the RBM. The CD algorithm calculates the state of the neurons in the hidden layer by the vector value of the neurons in the visible layer, and A single RBM is a neural network model consisting of a visible layer and a hidden layer [45]. Figure 1 shows a network structure consisting of 3 layers of the RBM, where v is the visible layer connecting the intrusion detection data, h is the hidden layer, which is used to extract the effective features of the input data, and W is the connection weight of the visible layer and the hidden layer. The neurons of the same layer in the network structure are not connected to each other, and the neurons of the adjacent layers are connected to each other by weights. The inactivated and activated states are represented by a binary, 0 and 1, for neurons in the network.
The RBM is an energy-based model [46], where v i is used to represent the state of neuron i in the visible layer, with corresponding bias a i , h j is used to represent the state of neuron j in the hidden layer, with corresponding bias b j , and the connection weight of neuron i and j is w ij . The energy of the RBM can be expressed as In the equation, θ = w ij , a i , b j is the RBM parameter, and n and m are the number of neurons in the visible layer and hidden layer, respectively.
From the energy function, the joint probability distribution of (v, h) can be obtained as follows: where For the training sample with the number of N, parameter θ is obtained by learning the maximum logarithmic likelihood function of the sample, which is where p(v|θ ) = 1 In the process of training, due to the complexity of calculating the normalization factor Z(θ), Gibbs and other sampling methods are generally used to approximate it [47]. Hinton proposed a fast learning algorithm using contrast divergence (CD) to train the network parameters, which improves the training efficiency and promotes the development of the RBM. The CD algorithm calculates the state of the neurons in the hidden layer by the vector value of the neurons in the visible layer, and then reconstructs the state of the neurons in the visible layer using the neurons in the hidden layer and calculates the state of the neurons in the hidden layer again using the reconstructed neurons in the visible layer, so that a new state of the neurons in the hidden layer can be obtained.
As the activation states of each neuron in the same layer of the RBM are independent of each other, the jth neuron in the hidden layer is calculated according to the state of the neurons in the visible layer, and the activation probability is as follows: where σ = 1 1+exp(−x) is the sigmoid activation function. The ith neuron in the visible layer is reconstructed by the hidden layer, and the activation probability is as follows: Further, the updated equations of the RBM weights and bias parameters can be obtained as follows: Among them, · data is the distribution, defined by the model of the original intrusion detection data, · recon is the distribution defined by the reconstructed model, ε is the learning rate, k is the number of iterations of the CD algorithm, w k+1 ij is the updated weight matrix, and a k+1 i and b k+1 j are the bias vectors, after the visible layer and the hidden layer have been updated.

Parameter Optimization Based on the PSO Algorithm
The PSO algorithm is inspired by the behavioral characteristics of bird predation and is used to solve the optimization problem. Each particle in the algorithm represents a potential solution to the problem, and each particle corresponds to a fitness value, which is determined by the fitness function. The velocity of the particle determines the direction and distance of the particle movement. The velocity is dynamically adjusted to the movement experience of the particle itself and other particles, thus realizing the optimization of the individual in the solvable space [48].
The PSO algorithm first initializes a group of particles in the solvable space, and in each iteration, the particles update themselves by tracking two extreme values. One is the optimal solution found by the particle itself, which is generally called the individual extreme value; the other is the current optimal solution, found by the whole population, which is generally called the global extreme value. The individual extreme value and global extreme value are updated continuously in the iteration process, and the final output global extreme value is the optimal solution, obtained by the algorithm [49].
It is supposed that in a D-dimensional search space, the population consisting of n particles is X = (X 1 , X 2 , · · · , X n ), where the ith particle represents a D-dimensional vector, X i = (x i1 , x i2 , · · · , x it ) T , which represents the position of the ith particle in the D-dimensional search space and also a potential solution to the problem. According to the fitness function, the fitness value corresponding to the position of each particle can be calculated. The fitness function defined in this paper is as follows: where correct represents the number of data that are correctly classified, and sum represents the total number of data.
Assuming that the velocity of the ith particle is V i = (V i1 , V i2 , · · · , V iD ) T , its individual extreme value is P i = (P i1 , P i2 , · · · , P iD ) T , and the global extreme value of the population is P g = P g1 , P g2 , · · · , P gD T . In each iteration, the particle updates its velocity and position through the individual and global extreme value. The updating equation is as follows: where d represents the dth dimension of the variable, d = 1, 2, · · · , D; i represents the ith particle, i = 1, 2, · · · , n; k is the current number of iterations; V id is the velocity of the dth dimension of the kth iteration of particle i; P k id is the coordinates of the individual optimal value, found by particle i in the dth dimension of the kth iteration; P k gd is the position of the global optimal solution, found by the entire particle swarm in the dth dimension of the kth iteration; c 1 and c 2 are learning factors, which are used to adjust the maximum step size for the optimal position of the individual and the optimal position of the group; r 1 and r 2 are random numbers distributed between [0, 1], called inertia factors, and the larger the value, the larger the range of the search; and ω is the inertia weight, which is a parameter introduced to balance the global search ability and local search ability. In order to prevent a blind search of particles, it is generally recommended to limit their position and velocity to a certain interval:

Intrusion Detection Based on the PSO-DBN
UAV mobile ad hoc network intrusion detection can be regarded as a classification problem. First, the intrusion detection dataset is preprocessed. The preprocessing process is shown in Figure 2. Each connection record in the KDD Cup 99 dataset consists of 41 attribute features, including 3 symbolic features and 38 numeric features. In this paper, the attribute mapping method is used to transform symbolic features into numeric features. For example, there are three values for the attribute feature, 'protocol type,' in column 2: tcp, udp, and icmp, which can be processed according to tcp = 1, udp = 2, and icmp = 3. Similarly, the 70 symbol values of the attribute feature, 'service,' and the 11 symbol values of the 'flag' can establish the mapping relationship between the symbol value and the corresponding numerical value.
where correct represents the number of data that are correctly classified, and sum represents the total number of data.
Assuming that the velocity of the ith particle is ( ) , and the global extreme value of the population is ( ) = 12 , , , T g g g gD P P P P . In each iteration, the particle updates its velocity and position through the individual and global extreme value. The updating equation is as follows: ( ) ( ) where d represents the dth dimension of the variable, = 1,2, , dD ; i represents the ith particle, = 1, 2, , in ; k is the current number of iterations; id V is the velocity of the dth dimension of the kth iteration of particle i ; k id P is the coordinates of the individual optimal value, found by particle i in the dth dimension of the kth iteration; k gd P is the position of the global optimal solution, found by the entire particle swarm in the dth dimension of the kth iteration; 1 c and 2 c are learning factors, which are used to adjust the maximum step size for the optimal position of the individual and the optimal position of the group; 1 r and 2 r are random numbers distributed between [0, 1], called inertia factors, and the larger the value, the larger the range of the search; and ω is the inertia weight, which is a parameter introduced to balance the global search ability and local search ability.
In order to prevent a blind search of particles, it is generally recommended to limit their position and velocity to a certain interval:

Intrusion Detection Based on the PSO-DBN
UAV mobile ad hoc network intrusion detection can be regarded as a classification problem. First, the intrusion detection dataset is preprocessed. The preprocessing process is shown in   Then, the obtained data are normalized, and the data are normalized within a range of [0, 1], according to Equation (10), to ensure that the attributes are within the same order of magnitude. In the equation, x(i) is the normalized value of the input variable; x(i) is the original value of the input variable; and x max and x min are the maximum and minimum values of the original data, respectively.
After preprocessing the intrusion detection data, the DBN network structure is initialized, and then the PSO algorithm is used to optimize the number of nodes in each layer of the DBN hidden layer to obtain an optimal network structure. Common hyperparameters in the DBN include the learning rate, the number of network layers, and the number of nodes in each layer. For the learning rate, it mainly controls the learning progress of the model. The larger the learning rate, the faster the learning speed. Generally speaking, users can intuitively set the optimal value of the learning rate by using experience values or other types of learning materials. For the number of network layers, the larger the number of layers, the more complicated the calculation. Compared to image processing, the dimension of the dataset used in this paper is not very high, and the selected network layers can meet the requirements of intrusion detection. In DBN, the selection of the number of nodes in each layer is very important. It not only has a great impact on the performance of the established DBN network model, but can also easily lead to "over fitting" in training if it is not properly selected. At present, the calculation formulas for determining the number of nodes in each layer proposed in most literatures are for the case of very large training samples, and the obtained results are not necessarily optimal. In fact, the number of nodes in each layer obtained by various calculation formulas greatly varies. In order to avoid "over fitting" during training as much as possible, and to ensure a high enough network performance and generalization ability, it is necessary to optimize the number of nodes in each layer.
The intrusion detection model based on the PSO-DBN is shown in Figure 3. The process of building the DBN includes pre-training and reverse fine-tuning. First, the forward propagation of the DBN is established through the training of the RBM model, and better initial model parameters are obtained. Then, the output error information of the training samples is calculated by the BP algorithm and propagated, from top to bottom, in each layer of the RBM, and the parameters of the DBN model are finely adjusted.
In the equation, ( ) xi is the normalized value of the input variable; ( ) xi is the original value of the input variable; and max x and min x are the maximum and minimum values of the original data, respectively. After preprocessing the intrusion detection data, the DBN network structure is initialized, and then the PSO algorithm is used to optimize the number of nodes in each layer of the DBN hidden layer to obtain an optimal network structure. Common hyperparameters in the DBN include the learning rate, the number of network layers, and the number of nodes in each layer. For the learning rate, it mainly controls the learning progress of the model. The larger the learning rate, the faster the learning speed. Generally speaking, users can intuitively set the optimal value of the learning rate by using experience values or other types of learning materials. For the number of network layers, the larger the number of layers, the more complicated the calculation. Compared to image processing, the dimension of the dataset used in this paper is not very high, and the selected network layers can meet the requirements of intrusion detection. In DBN, the selection of the number of nodes in each layer is very important. It not only has a great impact on the performance of the established DBN network model, but can also easily lead to "over fitting" in training if it is not properly selected. At present, the calculation formulas for determining the number of nodes in each layer proposed in most literatures are for the case of very large training samples, and the obtained results are not necessarily optimal. In fact, the number of nodes in each layer obtained by various calculation formulas greatly varies. In order to avoid "over fitting" during training as much as possible, and to ensure a high enough network performance and generalization ability, it is necessary to optimize the number of nodes in each layer.
The intrusion detection model based on the PSO-DBN is shown in Figure 3. The process of building the DBN includes pre-training and reverse fine-tuning. First, the forward propagation of the DBN is established through the training of the RBM model, and better initial model parameters are obtained. Then, the output error information of the training samples is calculated by the BP algorithm and propagated, from top to bottom, in each layer of the RBM, and the parameters of the DBN model are finely adjusted.
In the process of optimizing the number of nodes in the hidden layer, the prediction error of the classifier is selected as the fitness function of the model. Through the iteration condition of the PSO, the number of nodes in the hidden layer of the DBN is constantly updated, and the optimized PSO-DBN model is obtained. When the PSO-DBN model is completed, supervised learning is performed using BP to obtain an improved performance in updating the values of the weights of the nodes. Therefore, learning is performed after assigning a suitable number of epochs of the BP.  In the process of optimizing the number of nodes in the hidden layer, the prediction error of the classifier is selected as the fitness function of the model. Through the iteration condition of the PSO, the number of nodes in the hidden layer of the DBN is constantly updated, and the optimized PSO-DBN model is obtained. When the PSO-DBN model is completed, supervised learning is performed using BP to obtain an improved performance in updating the values of the weights of the nodes. Therefore, learning is performed after assigning a suitable number of epochs of the BP.

Dataset and Evaluation Indicators
This paper uses the KDD Cup 99 dataset as the training and testing set. The dataset is derived from the intrusion detection assessment program of the US Department of Defense Advanced Research Projects Agency (DARPA). It is hosted by the MIT Lincoln Laboratory. It is the benchmark dataset of network intrusion detection. It provides labeled training data and test data for researchers and is widely used for testing various intrusion detection methods. In this paper, 10% of the data was randomly selected from the "10% KDD Cup 99 training set," as the training data, and 10% of the data was randomly selected from the "KDD Cup 99 corrected labeled test data set," as the test data. The specific data distribution is shown in Table 1. In the intrusion detection system, the accuracy (ACC), precision (PRE), detection rate (DR), and false alarm rate (FAR) are usually used as indicators to evaluate the system. ACC is the proportion of correctly classified samples and is defined as follows: where TP refers to the number of positive instances detected as positive instances, TN refers to the number of negative instances detected as negative instances, FP refers to the number of negative instances detected as positive instances, and FN refers to the number of positive instances detected as negative instances. PRE is the proportion of samples that have an actual intrusion behavior in the samples detected as intrusion behavior, and it is defined as DR is the proportion of the number of detected intrusion samples in the total number of intrusion samples, and it is defined as FAR is the proportion of the number of normal samples that are falsely reported as intrusions in the total number of normal samples, and it is defined as The average reconstruction error (ARE) between the reconstructed data and the original data in each RBM network is also used as the criterion for performance evaluation, and it is calculated as follows: where k is the sample number; v ki is the original data of the kth sample; v ki is the kth sample data after reconstruction; and n is the number of samples.

Results and Comparison
The experimental environment of this paper is based on MATLAB R2013a and the data mining software, Weka. Compared to image processing, the dimension of the KDD Cup 99 dataset is not very high, so a DBN structure with four hidden layers can satisfy the experimental requirements. In order to verify the superiority of the PSO-DBN algorithm, proposed in this paper, a comparative experiment is carried out using the ANN, SVM, DNN, and Adaboost algorithms. The parameters of the PSO algorithm are shown in Table 2.  Figure 4 shows the results of the PSO algorithm for optimizing the number of hidden layer nodes under different iterations. In this paper, the classification error is used as the fitness function. In the process of PSO optimization, when the classification error is at its minimum, the optimal result can be obtained. The experimental results show that the numbers of hidden layer nodes optimized by the PSO algorithm are 39, 29, 14, and 7, and the minimum error is 0.0923.
The experimental environment of this paper is based on MATLAB R2013a and the data mining software, Weka. Compared to image processing, the dimension of the KDD Cup 99 dataset is not very high, so a DBN structure with four hidden layers can satisfy the experimental requirements. In order to verify the superiority of the PSO-DBN algorithm, proposed in this paper, a comparative experiment is carried out using the ANN, SVM, DNN, and Adaboost algorithms. The parameters of the PSO algorithm are shown in Table 2. The number of hidden layers is 4 Figure 4 shows the results of the PSO algorithm for optimizing the number of hidden layer nodes under different iterations. In this paper, the classification error is used as the fitness function. In the process of PSO optimization, when the classification error is at its minimum, the optimal result can be obtained. The experimental results show that the numbers of hidden layer nodes optimized by the PSO algorithm are 39, 29, 14, and 7, and the minimum error is 0.0923.
The average reconstruction error of the RBM is shown in Figure 5. As can be seen from the figure, the higher the number of iterations of the RBM, the smaller the average reconstruction error. When the number of iterations is more than 4, the average reconstruction error tends to be flat.
The accuracy under different BP epochs in the PSO-DBN model is shown in Figure 6. It can be seen, from the figure, that when the number of epochs is small, the accuracy of the proposed model increases with the increase in the number of BP iterations. When the number of epochs is 37, the accuracy of the model reaches a maximum of 92.44%. After that, the accuracy of the model decreases with the increase in the number of epochs and tends to be flat. Therefore, when the number of BP epochs is 693, the effect of the model is optimal.  The average reconstruction error of the RBM is shown in Figure 5. As can be seen from the figure, the higher the number of iterations of the RBM, the smaller the average reconstruction error. When the number of iterations is more than 4, the average reconstruction error tends to be flat. Sensors 2020, 20, x FOR PEER REVIEW 10 of 14  The PSO-DBN model, proposed in this paper, is compared to the ANN, SVM, Adaboost, and DNN classification methods. The ANN algorithm uses a three-layer structure, including an input layer, an intermediate layer, and an output layer. The other parameters are similar to those of the DBN. The type of SVM algorithm is set to C-support vector classification (C-SVC), and the kernel function is the radial basis function. The weight threshold of the Adaboost algorithm is set to 100, and the number of iterations is set to 10. The DNN algorithm uses an eight-layer structure, including an input layer, six intermediate layers, and an output layer. The performance comparison of ANN, SVM, Adaboost, DNN, and PSO-DBN is shown in Table 3. It can be seen from the table that in dealing with the classification problem of intrusion detection, PSO-DBN has the lowest false alarm rate, the highest accuracy rate, detection rate, and precision, and the best classification effect.  The accuracy under different BP epochs in the PSO-DBN model is shown in Figure 6. It can be seen, from the figure, that when the number of epochs is small, the accuracy of the proposed model increases with the increase in the number of BP iterations. When the number of epochs is 37, the accuracy of the model reaches a maximum of 92.44%. After that, the accuracy of the model decreases with the increase in the number of epochs and tends to be flat. Therefore, when the number of BP epochs is 693, the effect of the model is optimal.  The PSO-DBN model, proposed in this paper, is compared to the ANN, SVM, Adaboost, and DNN classification methods. The ANN algorithm uses a three-layer structure, including an input layer, an intermediate layer, and an output layer. The other parameters are similar to those of the DBN. The type of SVM algorithm is set to C-support vector classification (C-SVC), and the kernel function is the radial basis function. The weight threshold of the Adaboost algorithm is set to 100, and the number of iterations is set to 10. The DNN algorithm uses an eight-layer structure, including an input layer, six intermediate layers, and an output layer. The performance comparison of ANN, SVM, Adaboost, DNN, and PSO-DBN is shown in Table 3. It can be seen from the table that in dealing with the classification problem of intrusion detection, PSO-DBN has the lowest false alarm rate, the highest accuracy rate, detection rate, and precision, and the best classification effect.  The PSO-DBN model, proposed in this paper, is compared to the ANN, SVM, Adaboost, and DNN classification methods. The ANN algorithm uses a three-layer structure, including an input layer, an intermediate layer, and an output layer. The other parameters are similar to those of the DBN. The type of SVM algorithm is set to C-support vector classification (C-SVC), and the kernel function is the radial basis function. The weight threshold of the Adaboost algorithm is set to 100, and the number of iterations is set to 10. The DNN algorithm uses an eight-layer structure, including an input layer, six intermediate layers, and an output layer. The performance comparison of ANN, SVM, Adaboost, DNN, and PSO-DBN is shown in Table 3. It can be seen from the table that in dealing with the classification problem of intrusion detection, PSO-DBN has the lowest false alarm rate, the highest accuracy rate, detection rate, and precision, and the best classification effect. For the optimization problem in this paper, comparative experiments of the genetic algorithm (GA), simulated annealing algorithm (SA), and Bayesian optimization algorithm (BOA) are carried out. The genetic algorithm is a kind of computing model that simulates natural selection and the genetic mechanism. It is an algorithm that finds the optimal solution by simulating the natural evolution process. The simulated annealing algorithm imitates the behavior of the burning object during the annealing process to find the optimal solution. The actual effect of the algorithm is greatly affected by the parameters. The Bayesian optimization algorithm finds an acceptable extreme value by guessing what the black box function (objective function) looks like without knowing what the black box function looks like. The advantage of the Bayesian optimization algorithm is that it has fewer iterations, but it is easy to fall into the local optimum. The parameters of the GA and SA are shown in Tables 4 and 5, and the description of the BOA is shown in Table 6. Table 4. Parameters of the GA.

Number of Generation
Population Size Mutation Rate Crossover Rate 20 10 0.05 0.8 Table 5. Parameters of the SA.

Decay Scale
Step Factor Start Temperature Final Temperature 0.85 0.2 8 3 Table 6. Description of the Bayesian optimization algorithm (BOA).

Prior Function Acquisition Function Number of Iteration Objective Function
Gaussian process Regression EI Function 30 Error in Classification Table 7 shows the optimized number of nodes in each hidden layer. According to the optimized number of nodes in each hidden layer, the classification effect of the DBN network can be obtained. Table 8 shows the results of intrusion detection under various optimization algorithms. As can be seen from the table, the DBN network optimized by PSO has the lowest false alarm rate and the highest accuracy, detection rate, and precision. Therefore, the optimization effect of PSO is the best among the four optimization algorithms.

Conclusions
Intrusion detection for UAV networks is an important subject in the field of the security of UAV networks. The deep belief network optimized by PSO is a very effective method. Through the unsupervised learning of the RBM and the supervised learning of the BP, the DBN can effectively solve the intrusion detection problem of massive, high-dimensional, and nonlinear data. The DBN not only has a strong feature extraction ability for high-dimensional feature vectors, but it also has an efficient classification ability. Based on the DBN method, the PSO algorithm is used to optimize the number of hidden layer nodes of the DBN, to optimize its network structure. The experimental results show that the accuracy of the PSO-DBN algorithm, proposed in this paper, reaches 92.44%, which is higher than those achieved by the methods of ANN, SVM, Adaboost, and DNN. In addition, the optimization effect of PSO is better than those of GA, SA, and BOA. The PSO-DBN algorithm is very suitable for the tasks of information extraction in high-dimensional space, improves the intrusion recognition ability, and provides an effective solution to the problem of intrusion detection.