Radar HRRP Target Recognition Based on Stacked Autoencoder and Extreme Learning Machine

A novel radar high-resolution range profile (HRRP) target recognition method based on a stacked autoencoder (SAE) and extreme learning machine (ELM) is presented in this paper. As a key component of deep structure, the SAE does not only learn features by making use of data, it also obtains feature expressions at different levels of data. However, with the deep structure, it is hard to achieve good generalization performance with a fast learning speed. ELM, as a new learning algorithm for single hidden layer feedforward neural networks (SLFNs), has attracted great interest from various fields for its fast learning speed and good generalization performance. However, ELM needs more hidden nodes than conventional tuning-based learning algorithms due to the random set of input weights and hidden biases. In addition, the existing ELM methods cannot utilize the class information of targets well. To solve this problem, a regularized ELM method based on the class information of the target is proposed. In this paper, SAE and the regularized ELM are combined to make full use of their advantages and make up for each of their shortcomings. The effectiveness of the proposed method is demonstrated by experiments with measured radar HRRP data. The experimental results show that the proposed method can achieve good performance in the two aspects of real-time and accuracy, especially when only a few training samples are available.


Introduction
Radar target recognition based on high-resolution range profile (HRRP) has become a research hotspot due to the acquisition and processing of HRRP data being relatively easy [1][2][3][4][5][6][7]. However, the non-cooperative recognition [8,9] with limited training samples is a challenging task. In the non-cooperative situation, such as at the battle with time, the amount of data under the test is usually huge but the training data is limited. This is because the radar system cannot be guaranteed to detect and track the non-cooperative targets for a long period of time, which will cause HRRP data to be lost or not observed. Therefore, it is very important to study the generalization performance of the recognition model and obtain good recognition performance under conditions of fewer training samples.
It is generally known that feature extraction is a key step in radar target recognition. The quality of the extracted features determines the performance of target recognition. Therefore, many scholars [3,6,[10][11][12][13][14][15] have spent a lot of effort studying the methods of HRRP feature extraction. In [3], the principal component analysis (PCA) subspace model is utilized to minimize the reconstruction error. The multitask learning truncated stick-breaking hidden Markov model (MTL TSB-HMM) proposed in [6] is used to characterize the fast fourier transform (FFT) magnitude features of HRRP. Some other researchers [10,11] have used (a) The proposed model is "end to end", the input is the original radar HRRP data, and the output is the target class.
(b) This paper proposes a combination of SAE and regularized ELM, which can improve the recognition performance by making full use of the advantages of SAE and ELM. Compared with the shallow learning algorithms such as PCA [3], MTL TSB-HMMs [6], ELM [26], and so on, the proposed algorithm can extract the inherent characteristics of the target. Since the network is not required to be fine-tuned, the proposed algorithm is faster than the other deep learning models [18,[23][24][25].
(c) The proposed method does not only improve the training speed but also gets good performance when the training sample is small.
The rest of this paper is organized as follows: Section 2 introduces the relevant theoretical knowledge of SAE and ELM. In Section 3, we present the regularized ELM, then we also introduce the learning process of SAE-ELM. Experimental results are analyzed in Section 4, and in Section 5 the paper is summarized.

Description of HRRP
HRRP can be regarded as the amplitude of the coherent summations of the complex time returns from target scatters in each range cell [3], which represents the projection of the complex returned echoes from the target scattering centers onto the radar line-of-sight (LOS) [4]. The illustration of an HRRP sample from a plane target is shown in Figure 1. Since HRRP contains the target-important structural features such as target size and the distribution of scattering centers, etc., radar HRRP target recognition has drawn much attention from the radar automatic target recognition community [3][4][5][6][7]. follows: (a) The proposed model is "end to end", the input is the original radar HRRP data, and the output is the target class.
(b) This paper proposes a combination of SAE and regularized ELM, which can improve the recognition performance by making full use of the advantages of SAE and ELM. Compared with the shallow learning algorithms such as PCA [3], MTL TSB-HMMs [6], ELM [26], and so on, the proposed algorithm can extract the inherent characteristics of the target. Since the network is not required to be fine-tuned, the proposed algorithm is faster than the other deep learning models [18,[23][24][25].
(c) The proposed method does not only improve the training speed but also gets good performance when the training sample is small.
The rest of this paper is organized as follows: Section 2 introduces the relevant theoretical knowledge of SAE and ELM. In Section 3, we present the regularized ELM, then we also introduce the learning process of SAE-ELM. Experimental results are analyzed in Section 4 and in Section 5, the paper is summarized.

Description of HRRP
HRRP can be regarded as the amplitude of the coherent summations of the complex time returns from target scatters in each range cell [3], which represents the projection of the complex returned echoes from the target scattering centers onto the radar line-of-sight (LOS) [4]. The illustration of an HRRP sample from a plane target is shown in Figure 1. Since HRRP contains the target-important structural features such as target size and the distribution of scattering centers, etc., radar HRRP target recognition has drawn much attention from the radar automatic target recognition community [3][4][5][6][7].

Stacked Autoencoder
An autoencoder (AE) is an unsupervised learning algorithm. Figure 2 shows a simple model structure for an AE:

Stacked Autoencoder
An autoencoder (AE) is an unsupervised learning algorithm. Figure 2 shows a simple model structure for an AE: x z is the network parameter, W is the weight matrix, b is the bias vector, and ( ) s x is the activation function; the sigmoid function is selected here and ( ) 1 (1 ) For a dataset containing m samples, the total cost function is: is the connection weights between the -th i neurons of layer l and the -th j neurons of layer 1 l + ; l n and l s indicate the number of network layers and the number of neurons of layer l , respectively. The first part of Equation (2) is a mean squared error term and the second part is a weight decay term, which can be seen as a way to compromise between the small weights and minimized cost function [21]. The second term of Equation (2) is intended to prevent overfitting [19]. If the number of hidden layer nodes is large, and even more than the number of input layer nodes, the sparsity constraint needs to be added on the hidden units [19]. Hidden units are constrained to be zero most of the time when the activation function is selected as a sigmoid function [43]. This is motivated by the structure of the brain in which most of the neurons are inactive most of the time. By forcing the hidden units to have mostly zero activations/values, interesting representations can be learned. Then, the overall cost function is expressed as follows: where the second part of Equation (3) represents the sparse penalty term and the penalty term used in this paper is based on Kullback-Leibler (KL) divergence [44]. KL indicates the relative entropy [24] between the two Bernoulli random variables with the mean of ρ and the mean of ˆj ρ , and  Given an unlabeled dataset , each of its training data x (i) is encoded by an encoder and the feature representation y (i) of the hidden layer can be obtained: is the network parameter, W is the weight matrix, b is the bias vector, and s(x) is the activation function; the sigmoid function is selected here and s(x) = 1/(1 + e −x ). Then, the feature representation y (i) of the hidden layer is decoded by the decoder and the reconstruction vector z (i) can be obtained: , W is the weight matrix and W = W T . In fact, the optimization of the model parameters is to minimize the reconstruction error [16]: where m is the sample number and J is the cost function. The expression for J is J(x, z) = 1 2 z − x 2 . For a dataset containing m samples, the total cost function is: ji is the connection weights between the i-th neurons of layer l and the j-th neurons of layer l + 1; n l and s l indicate the number of network layers and the number of neurons of layer l, respectively. The first part of Equation (2) is a mean squared error term and the second part is a weight decay term, which can be seen as a way to compromise between the small weights and minimized cost function [21]. The second term of Equation (2) is intended to prevent overfitting [19].
If the number of hidden layer nodes is large, and even more than the number of input layer nodes, the sparsity constraint needs to be added on the hidden units [19]. Hidden units are constrained to be zero most of the time when the activation function is selected as a sigmoid function [43]. This is motivated by the structure of the brain in which most of the neurons are inactive most of the time. By forcing the hidden units to have mostly zero activations/values, interesting representations can be learned. Then, the overall cost function is expressed as follows: where the second part of Equation (3) represents the sparse penalty term and the penalty term used in this paper is based on Kullback-Leibler (KL) divergence [44]. KL indicates the relative entropy [24] between the two Bernoulli random variables with the mean of ρ and the mean ofρ j , and . Ifρ j = ρ, KL(ρ ρ j ) reaches the minimum value of 0, and ifρ j approaches 0 or 1, the KL(ρ ρ j ) increases dramatically. s 2 is the number of neurons in the hidden layer. η is the weight of the sparsity penalty. SAE is a neural network consisting of multiple layers of autoencoders, and the structure of an SAE is shown in Figure 3. We can use a greedy layer-wise training method to train SAE; that is, the output of each layer is wired to the input of the successive layer. Then, the BP algorithm is used to fine-tune the whole network. SAE is a neural network consisting of multiple layers of autoencoders, and the structure of an SAE is shown in Figure 3. We can use a greedy layer-wise training method to train SAE; that is, the output of each layer is wired to the input of the successive layer. Then, the BP algorithm is used to fine-tune the whole network.

Extreme Learning Machine
Given a set of N training datasets ( , ) , i x is an n-dimensional input vector and i t is the expected output. The output function of ELM with L hidden nodes is represented as follows: is the weight vector of input nodes to hidden nodes and i b is the bias of -th i hidden node; is the weight vector between hidden nodes and the output nodes; ( ) g x is the activation function of the hidden layer; and j o is the output vector.
If the SLFNs with L hidden nodes can approximate the N samples with zero error, we know that Equation (4) can be converted to the following formula [26,45]: The above equations can be written as: where

Extreme Learning Machine
Given a set of N training datasets ( x in ] T ∈ R n , and t i = [t i1 , t i2 , · · · , t im ] T ∈ R m , x i is an n-dimensional input vector and t i is the expected output. The output function of ELM with L hidden nodes is represented as follows: where w i = [w i1 , w i2 , · · · , w in ] T ∈ R n is the weight vector of input nodes to hidden nodes and b i is the bias of i-th hidden node; β i = [β i1 , β i2 , · · · , β im ] ∈ R m is the weight vector between hidden nodes and the output nodes; g(x) is the activation function of the hidden layer; and o j is the output vector. If the SLFNs with L hidden nodes can approximate the N samples with zero error, we know that Equation (4) can be converted to the following formula [26,45]: The above equations can be written as: where So training the SLFNs corresponds to finding the norm least-squares solutionβ, which can be shown as follows: where H + is the Moore-Penrose generalized inverse [46,47] of hidden layer output matrix H. Then, Equation (9) can be converted to: where I is the unit matrix and C is the regularization coefficient. ELM can also be explained using the optimization method. The ELM theory aims to reach the smallest training error Hβ − T 2 and the smallest norm of the output weights β [28,31]. Then, the solution of Equation (6) can be obtained by: where ξ i is the training error vector of the m output nodes corresponding to training sample x i , and h(x i ) is the hidden layer output vector of i-th sample x i . According to the Karush-Kuhn-Tucker (KKT) theorem [48], the same solution as Equation (10) can be obtained. Thus, the learning steps of the ELM can be summarized as Algorithm 1:

Stacked Autoencoder-Regularized Extreme Learning Machine
As we know the sample data has similar attributes and distribution features, we can use the similar relationships to enhance the generalization performance of ELM. Therefore, in this section, we propose a regularized ELM based on the class information of the target. Optimizing the output weights by maximizing the within-class scatter degree and by minimizing the inter-class scatter degree can make the ELM have better recognition and generalization ability. In addition, due to the random selection of input weights and hidden biases, ELM tends to need more hidden nodes to achieve better generalization performance, which makes the network structure complex. To address this issue, SAE is used to optimize the input weights and hidden biases of ELM; this achieves better results with fewer hidden layer nodes.

Regularized ELM Based on the Class Information of the Target
Given a set of sample sets x (j) i ; j = 1, 2, · · · , c; i = 1, 2, · · · , n j , the number of classes is c and the ω j class contains n j samples. The inter-class scatter matrix of class ω j is defined as where m j is the mean of ω j class samples and m j = 1 The total inter-class scatter matrix is defined as The within-class scatter matrix is defined as where m is the mean of all samples and m = 1 To improve the recognition performance, we should maximize the within-class scatter matrix and minimize the inter-class scatter matrix [49]. Therefore, we define the matrix S as shown below: Then, the optimization formula of regularized ELM can be written as: We can solve the above problem by defining the Lagrange function: then where Then, the solution to Equation (16) is: Thus, the learning steps of the regularized ELM can be summarized as Algorithm 2:

SAE-ELM
In order to implement recognition, we need to add a classifier to the top encoding layer of SAE. In this section, we propose that using ELM instead of softmax as a classifier can effectively improve the network training speed. In addition, we can get the appropriate ELM network parameters by training SAE. The SAE-ELM system architecture is shown in Figure 4, and the illustration of the structure is shown in Figure 5. The learning process of SAE-ELM is as follows: (2): Set random values to the input weights i w and the hidden layer biases i b ; (3): Calculate the hidden layer output matrix H according to Equation (7); (4): Calculate the output weight vector β according to Equation (19).

SAE-ELM
In order to implement recognition, we need to add a classifier to the top encoding layer of SAE. In this section, we propose that using ELM instead of softmax as a classifier can effectively improve the network training speed. In addition, we can get the appropriate ELM network parameters by training SAE. The SAE-ELM system architecture is shown in Figure 4, and the illustration of the structure is shown in Figure 5. The learning process of SAE-ELM is as follows:

Input
Hidden layer 1 Hidden layer 2 Hidden layer 3 Output (3) Establish the third layer of the AE network to determine the parameters of ELM. ELM not only has a faster learning speed than the traditional learning methods but it also has a good generalization performance. However, ELM needs more hidden nodes than conventional tuning-based learning algorithms due to the random set of input weights and hidden biases. Therefore, we establish the third layer of the AE network to determine the input weights and hidden biases for ELM. Similar to step (2), the output 3 H of the third hidden layer and network    Figure 5. Illustration of the structure of SAE-ELM.

Experimental Results and Discussion
In this section, we will verify the effectiveness of the proposed algorithm. The experiments were performed on an Intel(R) Core(TM) 3.60 GHz CPU with 8 GB of RAM and the MATLAB R2013a environment.
In this section, we utilize measured radar HRRP data from three real airplanes that are measured by a C-band radar with a center frequency of 5.52 GHz and a bandwidth of 400 MHz to (1) Establish the first layer of AE network and, as described in Section 2.2, use the gradient descent method to train the network. Then, we can obtain the output H 1 of the first hidden layer and the network parameters θ 1 . H 1 is the characteristic representation of the input data and θ 1 = (W 1 , b 1 ).
(2) Establish the second layer of the AE network. The first layer output H 1 is input as the second layer. We use the gradient descent method to train the network. Then, the output H 2 of the second hidden layer and the network parameters θ 2 are available and θ 2 = (W 2 , b 2 ).
(3) Establish the third layer of the AE network to determine the parameters of ELM. ELM not only has a faster learning speed than the traditional learning methods but it also has a good generalization performance. However, ELM needs more hidden nodes than conventional tuning-based learning algorithms due to the random set of input weights and hidden biases. Therefore, we establish the third layer of the AE network to determine the input weights and hidden biases for ELM. Similar to step (2), the output H 3 of the third hidden layer and network parameters θ 3 = (W 3 , b 3 ) can be obtained. We can utilize W 3 as the input weights and b 3 as the hidden biases of ELM, then the hidden layer output matrix of ELM is H 3 .
(4) Establish the ELM network as a classifier. The input is H 2 , the input weights and hidden biases are θ 3 = (W 3 , b 3 ), and the hidden layer output matrix is H 3 . Then, as described in Section 3.1, the output weight vector β can be calculated according to Equation (19).

Experimental Results and Discussion
In this section, we will verify the effectiveness of the proposed algorithm. The experiments were performed on an Intel(R) Core(TM) 3.60 GHz CPU with 8 GB of RAM and the MATLAB R2013a environment.
In this section, we utilize measured radar HRRP data from three real airplanes that are measured by a C-band radar with a center frequency of 5.52 GHz and a bandwidth of 400 MHz to validate the effectiveness of the proposed method. The An-26 is a medium-sized propeller airplane, the Yark-42 is a large and medium-sized jet airplane, and the Citation business jet is a small-sized jet airplane. The three aircraft models are shown in Figure 6. The detailed size of each airplane and the parameters of the measured radar are listed in Table 1. In our experiments, each aircraft target has 26,000 HRRP samples and the measured HRRP is a 256-dimensional vector.  Figure 5. Illustration of the structure of SAE-ELM.

Experimental Results and Discussion
In this section, we will verify the effectiveness of the proposed algorithm. The experiments were performed on an Intel(R) Core(TM) 3.60 GHz CPU with 8 GB of RAM and the MATLAB R2013a environment.
In this section, we utilize measured radar HRRP data from three real airplanes that are measured by a C-band radar with a center frequency of 5.52 GHz and a bandwidth of 400 MHz to validate the effectiveness of the proposed method. The An-26 is a medium-sized propeller airplane, the Yark-42 is a large and medium-sized jet airplane, and the Citation business jet is a small-sized jet airplane. The three aircraft models are shown in Figure 6. The detailed size of each airplane and the parameters of the measured radar are listed in Table 1. In our experiments, each aircraft target has 26,000 HRRP samples and the measured HRRP is a 256-dimensional vector.    In order to verify the validity of the algorithm proposed in this paper, we compared it with the commonly used methods: PCA [3], MTL TSB-HMMS [6], ELM [26], SAE [21], and DDAEs [23].
The activation function of the hidden layer of ELM is sigmoid and G(a, b, x) = 1/(1 + exp(−(a · x + b))).
The regularization coefficient C is 0.2. The number of hidden nodes of ELM is 1500. Due to the sample dimension being 256, we set the number of nodes in the visible layer of deep architecture to 256. It is well known that a more abstract feature representation can be obtained with an increase in the network depth. However, too many layers can make the network difficult to train effectively and brings in more parameters to learn. Through the analysis of the experimental data and task requirements, we found that three is a good choice for the number of hidden layers. Therefore, we set the number of hidden layers to be three and the number of nodes in the hidden layers as 1500-500-50, respectively. From Figure 7 we can see that the mean square error (MSE) of each layer reconstruction of the network model decreases with an increase of iterations. When the number of iterations is 25, the MSE is less than 0.003. Therefore, in order to speed up the training, we set the number of iterations in the network to 25. In order to verify the validity of the algorithm proposed in this paper, we compared it with the commonly used methods: PCA [3], MTL TSB-HMMS [6], ELM [26], SAE [21], and DDAEs [23]. The activation function of the hidden layer of ELM is sigmoid and ( , , ) 1/(1 exp( ( ))) The regularization coefficient C is 0.2. The number of hidden nodes of ELM is 1500. Due to the sample dimension being 256, we set the number of nodes in the visible layer of deep architecture to 256. It is well known that a more abstract feature representation can be obtained with an increase in the network depth. However, too many layers can make the network difficult to train effectively and brings in more parameters to learn. Through the analysis of the experimental data and task requirements, we found that three is a good choice for the number of hidden layers. Therefore, we set the number of hidden layers to be three and the number of nodes in the hidden layers as 1500-500-50, respectively. From Figure 7 we can see that the mean square error (MSE) of each layer reconstruction of the network model decreases with an increase of iterations. When the number of iterations is 25, the MSE is less than 0.003. Therefore, in order to speed up the training, we set the number of iterations in the network to 25. Before network training, data pre-processing is needed to solve the amplitude-scale and time-shift sensitivities. According to the previous study [3,[5][6][7], we usually use the energy normalization method and time-shift compensation algorithm to cope with the above issues. Figure 8 shows the range profiles of pre-processed aircraft targets. In the non-cooperative situation, such as at the battle with time, the amount of data under the test is usually huge, but the training data is limited. This is because the radar system cannot be guaranteed to detect and track the non-cooperative targets for a long period of time, which will cause HRRP data to be lost or not observed. Therefore, it is very important to study the generalization performance of the model and obtain good recognition performance under the conditions of fewer training samples.
As is shown in Figure 9, as the number of training samples increases, the classification accuracy of different algorithms also increases. However, deep architecture algorithms (e.g., SAE, DDAEs, and the proposed method) are more accurate than shallow architecture algorithms (e.g., PCA, MTL TSB-HMMS, and ELM). The traditional recognition algorithms rely on the experience of the researchers and require a complete set of training samples to ensure excellent recognition performance. Because of the shallow architecture, these algorithms cannot effectively separate the intrinsic class information of the target from some external factors in the feature space. The depth structure algorithms lose the inherent class information of the target as little as possible while demodulating the coupling relationship between various factors layer by layer. More intuitively, the low-level features in the deep network are usually distributed and can be shared among different classes, while the high-level features are usually more abstract and more separable. Therefore, better Before network training, data pre-processing is needed to solve the amplitude-scale and time-shift sensitivities. According to the previous study [3,[5][6][7], we usually use the energy normalization method and time-shift compensation algorithm to cope with the above issues. Figure 8 shows the range profiles of pre-processed aircraft targets. In the non-cooperative situation, such as at the battle with time, the amount of data under the test is usually huge, but the training data is limited. This is because the radar system cannot be guaranteed to detect and track the non-cooperative targets for a long period of time, which will cause HRRP data to be lost or not observed. Therefore, it is very important to study the generalization performance of the model and obtain good recognition performance under the conditions of fewer training samples.
As is shown in Figure 9, as the number of training samples increases, the classification accuracy of different algorithms also increases. However, deep architecture algorithms (e.g., SAE, DDAEs, and the proposed method) are more accurate than shallow architecture algorithms (e.g., PCA, MTL TSB-HMMS, and ELM). The traditional recognition algorithms rely on the experience of the researchers and require a complete set of training samples to ensure excellent recognition performance. Because of the shallow architecture, these algorithms cannot effectively separate the intrinsic class information of the target from some external factors in the feature space. The depth structure algorithms lose the inherent class information of the target as little as possible while demodulating the coupling relationship between various factors layer by layer. More intuitively, the low-level features in the deep network are usually distributed and can be shared among different classes, while the high-level features are usually more abstract and more separable. Therefore, better generalization performance is a great advantage of deep networks. Due to the proposed method not only obtaining the deep feature representation of radar HRRP but also making better use of the target category information, the classification performance of the proposed method is better than that of SAE and DDAEs. In addition, when the training sample is smaller, the classification performance of the proposed method is better than the other algorithms, which shows that the proposed method has better generalization performance. When the number of training samples for each target is 3500, the classification accuracy of different algorithms is listed in Table 2. It can be seen from the table that when the number of training samples is 3500, the accuracy of the proposed algorithm reaches 95.01%, which is 0.22% higher than that of the DDAE algorithm, and 1.5% higher than that of the SAE. The accuracy of the shallow structure algorithms is not more than 90%. It can be concluded that the proposed method can obtain better classification performance when there is only a small amount of training samples available. generalization performance is a great advantage of deep networks. Due to the proposed method not only obtaining the deep feature representation of radar HRRP but also making better use of the target category information, the classification performance of the proposed method is better than that of SAE and DDAEs. In addition, when the training sample is smaller, the classification performance of the proposed method is better than the other algorithms, which shows that the proposed method has better generalization performance. When the number of training samples for each target is 3500, the classification accuracy of different algorithms is listed in Table 2. It can be seen from the table that when the number of training samples is 3500, the accuracy of the proposed algorithm reaches 95.01%, which is 0.22% higher than that of the DDAE algorithm, and 1.5% higher than that of the SAE. The accuracy of the shallow structure algorithms is not more than 90%. It can be concluded that the proposed method can obtain better classification performance when there is only a small amount of training samples available.    generalization performance is a great advantage of deep networks. Due to the proposed method not only obtaining the deep feature representation of radar HRRP but also making better use of the target category information, the classification performance of the proposed method is better than that of SAE and DDAEs. In addition, when the training sample is smaller, the classification performance of the proposed method is better than the other algorithms, which shows that the proposed method has better generalization performance. When the number of training samples for each target is 3500, the classification accuracy of different algorithms is listed in Table 2. It can be seen from the table that when the number of training samples is 3500, the accuracy of the proposed algorithm reaches 95.01%, which is 0.22% higher than that of the DDAE algorithm, and 1.5% higher than that of the SAE. The accuracy of the shallow structure algorithms is not more than 90%. It can be concluded that the proposed method can obtain better classification performance when there is only a small amount of training samples available.     As shown in Table 3, the training time of SAE, DDAEs, and the proposed method are compared. The proposed method is almost five times faster than SAE in training time; that is because we need to add the softmax regression classifier to the top encoding layer of SAE, and the last step of training is to fine-tune the parameters of all layers to achieve the desired classification performance. This process will take a lot of time. The proposed method adds ELM with faster learning speed and less required tuning parameters to the top layer of SAE as a classifier. The proposed method does not need to fine-tune the parameters of all layers, thus reducing the network training steps and training time. SAE and DDAEs are similar in training time because their network structures are the same. It can be seen from Figure 10 that the classification accuracy of ELM becomes much better as the hidden nodes increase. When the number of hidden nodes is 1500, the classification accuracy is 89.01%. When the number of hidden nodes increases to 4000, ELM reaches an accuracy of 90.01%. After that, the value is almost unchanged all the time because the ELM is in an over-fitting state. Therefore, we know that in order to get a better classification effect, ELM needs more hidden nodes, which will make the network structure more complex. As we know from Table 2, only 50 hidden nodes are required to obtain an accuracy of 95.01% when the proposed method uses regularized ELM for classification. Therefore, the proposed method can effectively reduce the hidden nodes of ELM and simplify the network structure. As shown in Table 3, the training time of SAE, DDAEs, and the proposed method are compared. The proposed method is almost five times faster than SAE in training time; that is because we need to add the softmax regression classifier to the top encoding layer of SAE, and the last step of training is to fine-tune the parameters of all layers to achieve the desired classification performance. This process will take a lot of time. The proposed method adds ELM with faster learning speed and less required tuning parameters to the top layer of SAE as a classifier. The proposed method does not need to fine-tune the parameters of all layers, thus reducing the network training steps and training time. SAE and DDAEs are similar in training time because their network structures are the same. It can be seen from Figure 10 that the classification accuracy of ELM becomes much better as the hidden nodes increase. When the number of hidden nodes is 1500, the classification accuracy is 89.01%. When the number of hidden nodes increases to 4000, ELM reaches an accuracy of 90.01%. After that, the value is almost unchanged all the time because the ELM is in an over-fitting state. Therefore, we know that in order to get a better classification effect, ELM needs more hidden nodes, which will make the network structure more complex. As we know from Table 2, only 50 hidden nodes are required to obtain an accuracy of 95.01% when the proposed method uses regularized ELM for classification. Therefore, the proposed method can effectively reduce the hidden nodes of ELM and simplify the network structure.

Conclusions
In this paper, we have proposed a novel radar HRRP target recognition method based on SAE and regularized ELM. SAE, as an important component of the deep learning structure, can extract deep features and mine the essential information of radar HRRP, which has a beneficial effect on recognition. ELM is also useful for recognition because of its fast learning speed and good generalization performance. Experimental results show that the proposed method does not only reduce the network training time but also makes the ELM achieve high recognition accuracy under the condition of using fewer hidden nodes. In addition, when there is only a small amount of training samples available, the proposed method can also obtain good recognition performance.

Conclusions
In this paper, we have proposed a novel radar HRRP target recognition method based on SAE and regularized ELM. SAE, as an important component of the deep learning structure, can extract deep features and mine the essential information of radar HRRP, which has a beneficial effect on recognition. ELM is also useful for recognition because of its fast learning speed and good generalization performance. Experimental results show that the proposed method does not only reduce the network training time but also makes the ELM achieve high recognition accuracy under the condition of using fewer hidden nodes. In addition, when there is only a small amount of training samples available, the proposed method can also obtain good recognition performance. However, we also know that in real situations the training samples are usually obtained under the condition of high signal-to-noise ratio (SNR) via some cooperative measurement experiments, while the test samples are usually achieved in the non-cooperative circumstance where the high SNR cannot be guaranteed due to the severe measurement conditions. Thus, it is important to optimize the proposed method to match the noise level of the received test samples in the recognition stage. Stacked denoising sparse autoencoder (sDSAE) can effectively eliminate the influence of noise. Therefore, in the near future, we will consider combining sDSAE with ELM to solve this problem.