A Deep Learning-Based Satellite Target Recognition Method Using Radar Data

A novel satellite target recognition method based on radar data partition and deep learning techniques is proposed in this paper. For the radar satellite recognition task, orbital altitude is introduced as a distinct and accessible feature to divide radar data. On this basis, we design a new distance metric for HRRPs called normalized angular distance divided by correlation coefficient (NADDCC), and a hierarchical clustering method based on this distance metric is applied to segment the radar observation angular domain. Using the above technology, the radar data partition is completed and multiple HRRP data clusters are obtained. To further mine the essential features in HRRPs, a GRU-SVM model is designed and firstly applied for radar HRRP target recognition. It consists of a multi-layer GRU neural network as a deep feature extractor and linear SVM as a classifier. By training, GRU neural network successfully extracts effective and highly distinguishable features of HRRPs, and feature visualization technology shows its advantages. Furthermore, the performance testing and comparison experiments also demonstrate that GRU neural network possesses better comprehensive performance for HRRP target recognition than LSTM neural network and conventional RNN, and the recognition performance of our method is almost better than that of other several common feature extraction methods or no data partition.


Introduction
Space target recognition is a primary function of space surveillance information systems, and satellite recognition is of critical importance on this study, especially for observation satellites. However, few open research achievements have been reported. The difficulty of this problem is that satellites are simply too small or too far away for detailed information to be recognized, and it is relatively hard to obtain effective identification data. With the development of wideband radar, not only can we locate the target position, but also get other useful radar data of targets, such as high-resolution range profile (HRRP) and inverse synthetic aperture radar (ISAR) image [1,2]. An HRRP is the phasor sum of the time returns from different scatterers on the target located within a resolution cell [3], which represents the projection of the complex returned echoes from the target scattering centers onto the range axis [4]. It contains lots of geometric structure information about the target down range, such as scatterers distribution and target size. In addition, compared with ISAR image, HRRPs have the advantages of easy acquisition, storage and processing. That's why radar HRRP target recognition has gained high attention from radar automatic target recognition (RATR) community [5][6][7]. In summary, this paper focuses on the satellite target recognition based on radar data and proposes a novel recognition method, which mainly consists of data partition and deep learning model. Feature extraction and selection is a basic and crucial technology for radar target recognition research. It is significant to adopt reasonable and effective features to improve recognition performance. Currently, the targets of HRRP recognition research are basically about ground or aviation targets, such as tanks and airplanes [8][9][10][11][12][13][14][15]. Satellites, as an important space target, have different motion characteristics, one of which is that their motion must follow Kepler's law. Besides, because the fuel carried by satellites is limited, the orbital maneuver range of most satellites is restricted, which makes satellite orbits relatively stable [16,17]. Orbit information is easily accessible and non-burdensome because target range and position measurement is a basic function of radar, therefore, orbit information is a distinct and accessible feature for recognizing the satellites whose eccentricity is very small, and could be introduced into radar data partition when multiple satellites need to be recognized. In addition, radar observation pitch and azimuth angles are also important and helpful information. Utilizing radar observation angles has many benefits for target recognition, such as reducing search range and computation, relaxing attitude sensitivity and improving recognition rate [18,19]. Radar observation angular domain division have been studied in [8,11,20,21]. Uniform frame segmentation method is used in [8,11] but has been proved simple and unreasonable [22]. Angular domain division method based on statistical characteristics is studied in [20], but it would generate many matching templates and need huge computation burden and storage requirement. Correlation coefficient is firstly introduced to solve this problem in [21] and could measure similarity between HRRPs to a certain extent. However, other information may also be helpful to measure similarity and could be applied, such as angular distance. Therefore, for the data partition, the proposed method will utilize orbit information as a powerful feature, and a hierarchical clustering method with a novel distance metric, namely normalized angular distance divided by correlation coefficient (NADDCC), will be applied to segment radar angular domain.
In addition to the above available satellite and radar information, it is still necessary to further mine information in HRRP data to raise the recognition accuracy. Many studies have been done on HRRP feature extraction and selection methods. In the early days, researchers often calculated FFT-magnitude, power spectrum and a variety of high-order spectrum of HRRP data, and used them as the features of classifier for target recognition [8][9][10]. Although these engineered features could play a part in target recognition, they are dependent on researchers' experience and techniques. Other than the features of artificial selection, machine learning algorithms have been widely utilized to extract features based on high-dimensional HRRP data [11][12][13][14][15]. Principal component analysis (PCA) is applied to extract the complex HRRPs' feature subspace within each target-aspect sector in the literature [11]. Dictionary learning is adopted to extract the features of HRRP data, which possess high noise-robust and discrimination [12,13]. Manifold learning is employed to reduce the feature dimensions of radar HRRP [14,15]. These methods can extract appropriate features in some cases, but they are all shallow architectures that may not represent the essence of radar HRRP. Thereby, how to automatically extract the deep abstract features, which can play an important role in target recognition, has become a significant research issue.
The deep learning theory [23] advanced by Hinton could solve the abovementioned problem effectively. Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [24]. Because of its powerful recognition or prediction ability, deep learning methods have consistently been used in many applications, such as computer vision [25], speech recognition [26], networking and health care [27][28][29]. Some deep learning structures applied in several recent papers have been demonstrated useful for radar target recognition, such as autoencoder and its varieties [30,31], convolutional neural network (CNN) [32] and recurrent neural network (RNN) [33,34]. Due to the unique structure of RNN, it has been widely applied to process sequential data, such as action recognition [35], scene labelling [36], and language processing [37], and has achieved impressive results [38]. However, it is founded that simple conventional RNN could not learn on a wide range of dependencies, and gradients of distant time steps do not play a role in learning because of gradient vanishing. To solve this problem, long short-term memory (LSTM) and gated recurrent unit (GRU) architecture have been designed to learn long term reliance on sequential data. GRU is a variety of LSTM and retains the LSTM's resistance to the problem of gradient vanishing. Moreover, the internal structure of GRU is simple and it requires less computation to update the hidden state, which makes the training speed faster [37,39,40]. So GRU neural network will be applied in this satellite recognition method to extract and select the deep abstract features of radar sequential HRRP data.
In this paper, a novel satellite target recognition method is proposed based on radar data. This method could make full use of radar data and apply deep learning technology to extract highly distinguishable features. Its features are summarized as follows: (a) Satellite orbital altitude is introduced as a distinct and accessible feature for radar data partition; (b) Radar observation angles information is fully utilized, and a hierarchical clustering method based on a new distance metric (NADDCC) is applied to segment angular domain to improve recognition rate; (c) A novel end-to-end GRU-SVM model is designed, which uses radar HRRP data as input and target class as output, and firstly applied to identify targets based on radar HRRP data. GRU neural network is applied as a deep features extractor and support vector machine (SVM) is used as a classifier in this model. Compared with some deep neural networks such as CNN, autoencoder and denoising autoencoder, and shallow learning algorithms such as PCA [11], dictionary learning [12,13] and manifold learning [14,15], the presented model can extract the deep abstract features of HRRPs and obtain better recognition results.
The rest of this paper is organized as follows: in Section 2, the information contained in satellite orbit and radar data is analysed. In Section 3, we introduce GRU, SVM and the construction of the GRU-SVM model, and present the overall flow chart of this recognition method. In Section 4, recognition results are provided and the performance under different recognition methods and conditions is compared and analysed. In Section 5, some conclusions are drawn.
Notations: To simplify the presentation, we define the following notations used in this paper. We use bold lower case letters to represent a vector, e.g., µ ∈ C D , and use bold upper case letters to represent a matrix, e.g., M ∈ C i×j . The acronyms used in this paper are summarized in Table 1 for the sake of readability.

Description and Preprocessing of HRRP
HRRP is the amplitude of echo summation for target scattering centers in each range cell of wideband radar. Figure 1 shows the illustration of a HRRP sample from a satellite target. High resolution radar operates in microwave frequency band, and the size of targets or their components is much larger than the wavelength of radar. In this case, the echo characteristics of targets can be calculated by using a simplified scattering center model [3,4,9,41,42]. Therefore, for complex targets such as a satellite, the projection of an object on radar line of sight can be divided into many range cells by high resolution radar. According to the scattering center model, the scatterers in different range cells will rotate in the same way when a satellite target rotates, which causes that the echo amplitudes between range cells have certain correlation. Besides, windowing processing of the returned echoes before getting HRRP data and multiple reflections phenomena of measured HRRP data would enhance the correlation between adjacent range cells. Therefore, there is a certain temporal correlation between range cells and radar HRRP could be seen as sequential data, which is suitable to be learned by RNN-based neural networks.

Description and Preprocessing of HRRP
HRRP is the amplitude of echo summation for target scattering centers in each range cell of wideband radar. Figure 1 shows the illustration of a HRRP sample from a satellite target. High resolution radar operates in microwave frequency band, and the size of targets or their components is much larger than the wavelength of radar. In this case, the echo characteristics of targets can be calculated by using a simplified scattering center model [3,4,9,41,42]. Therefore, for complex targets such as a satellite, the projection of an object on radar line of sight can be divided into many range cells by high resolution radar. According to the scattering center model, the scatterers in different range cells will rotate in the same way when a satellite target rotates, which causes that the echo amplitudes between range cells have certain correlation. Besides, windowing processing of the returned echoes before getting HRRP data and multiple reflections phenomena of measured HRRP data would enhance the correlation between adjacent range cells. Therefore, there is a certain temporal correlation between range cells and radar HRRP could be seen as sequential data, which is suitable to be learned by RNN-based neural networks. The radar signatures, which return from multiple scattering centers within the same range cell, will be coherently summed as a single signature for that range cell. According to related literature [3,9], suppose the transmitted signal is s t e j2πf c t , the n-th complex echo in the d-th range cell (d=1, 2, ⋯, D) in the baseband can be approximated as: where s t is the complex envelop which could be approximated as unchanged for all scatterers in one range cell. λ represents the wavelength of wideband radar and f c is the carrier frequency of radar signal. L d denotes the number of scatterers in the d-th range cell. σ di represents the intensity of the i-th scatterer in the d th range cell. R n is the radial distance between the radar and target reference center in the n-th echo. Δγ di n is the radial displacement of the i-th scatterer of the d-th range cell in the n-th echo. Usually, s t is a rectangular pulse signal with unit intensity and could be omitted. After eliminating the initial phase of the n-th echo e -j 4π λ ⁄ R n , the n-th HRRP can be defined as: x n = x 1 n ,x 2 n ,⋯,x D n = σ 1i e jϕ 1i n Several sensitivity issues of HRRP should be focused on when the HRRP target recognition task is carried out. The first one is time-shift sensitivity. To decrease the computation complexity, HRRP The radar signatures, which return from multiple scattering centers within the same range cell, will be coherently summed as a single signature for that range cell. According to related literature [3,9], suppose the transmitted signal is s(t)e j2πf c t , the n − th complex echo in the d − th range cell (d = 1, 2, · · · , D) in the baseband can be approximated as: where s(t) is the complex envelop which could be approximated as unchanged for all scatterers in one range cell. λ represents the wavelength of wideband radar and f c is the carrier frequency of radar signal. L d denotes the number of scatterers in the d − th range cell. σ di represents the intensity of the i − th scatterer in the d − th range cell. R(n) is the radial distance between the radar and target reference center in the n − th echo. ∆γ di (n) is the radial displacement of the i − th scatterer of the d − th range cell in the n − th echo. Usually, s(t) is a rectangular pulse signal with unit intensity and could be omitted. After eliminating the initial phase of the n − th echo e −j(4π/λ)R(n) , the n − th HRRP can be defined as: Several sensitivity issues of HRRP should be focused on when the HRRP target recognition task is carried out. The first one is time-shift sensitivity. To decrease the computation complexity, HRRP is only a portion of received radar echo extracted by a range window, which contains the target signal. Thus the position of the target signal in HRRP would change with the measurement. However, it would be better for feature learning that all the training samples meet a uniform parameter model. So we adopt envelope alignment method [43] as time-shift compensation technique in this paper, which is achieved based on the summation average of multiple HRRPs cross-correlations. The second one is amplitude-scale sensitivity. It is caused by the fact that many factors could influence the intensity of an HRRP, such as target distance, radar transmitting power, radar antenna gain, radar receiver gain and radar system losses. It makes that HRRPs measured by different radars or under different conditions would have different amplitude-scales. In order to deal with amplitude scale sensitivity, each HRRP is normalized by energy normalization method [3,9]. Suppose an HRRP is defined as x(n) = [x 1 (n), x 2 (n), · · · , x D (n)], then its energy normalization preprocessing result x(n) is shown as follows: After the above preprocessing step, the HRRP sample examples of satellites are shown in Figure 2. The last and toughest one is called target-attitude sensitivity. This issue will be analyzed in detail in Section 2.3 and further alleviated. is only a portion of received radar echo extracted by a range window, which contains the target signal. Thus the position of the target signal in HRRP would change with the measurement. However, it would be better for feature learning that all the training samples meet a uniform parameter model. So we adopt envelope alignment method [43] as time-shift compensation technique in this paper, which is achieved based on the summation average of multiple HRRPs cross-correlations. The second one is amplitude-scale sensitivity. It is caused by the fact that many factors could influence the intensity of an HRRP, such as target distance, radar transmitting power, radar antenna gain, radar receiver gain and radar system losses. It makes that HRRPs measured by different radars or under different conditions would have different amplitude-scales. In order to deal with amplitude scale sensitivity, each HRRP is normalized by energy normalization method [3,9]. Suppose an HRRP is defined as x n = x 1 n ,x 2 n ,⋯,x D n , then its energy normalization preprocessing result n is shown as follows: After the above preprocessing step, the HRRP sample examples of satellites are shown in Figure 2. The last and toughest one is called target-attitude sensitivity. This issue will be analyzed in detail in Section 2.3 and further alleviated.

Analysis and Statistics of Satellite Orbit
Currently, thousands of artificial satellites move around the Earth for the purposes of communication and navigation, information relay, missile warning, on-orbit service and so on. Among them, observation satellites are the key targets, which need to be identified for space surveillance information systems. Unlike ground or aviation targets, satellites' motion must follow Kepler's law, that is, they basically move along certain orbit. As shown in Figure 3a, satellite orbit is mainly described with the following parameters: (1) Semi-major axis (a): the distance from the apogee or perigee to orbital center which describes the size of satellite orbit; (2) Eccentricity (e): describes the shape of satellite orbit; (3) Inclination (i): the angle between the orbital plane and the equatorial plane which determines the accessible area of a satellite; (4) Longitude of ascending node (Ω): the right ascension at the intersection of the orbital plane and the equatorial plane. (5) Argument of perigee (ω): angular distance of orbit perigee and ascending node. Based on the above parameters, the altitude range of a satellite can be computed by:

Analysis and Statistics of Satellite Orbit
Currently, thousands of artificial satellites move around the Earth for the purposes of communication and navigation, information relay, missile warning, on-orbit service and so on. Among them, observation satellites are the key targets, which need to be identified for space surveillance information systems. Unlike ground or aviation targets, satellites' motion must follow Kepler's law, that is, they basically move along certain orbit. As shown in Figure 3a, satellite orbit is mainly described with the following parameters: (1) Semi-major axis (a): the distance from the apogee or perigee to orbital center which describes the size of satellite orbit; (2) Eccentricity (e): describes the shape of satellite orbit; (3) Inclination (i): the angle between the orbital plane and the equatorial plane which determines the accessible area of a satellite; (4) Longitude of ascending node (Ω): the right ascension at the intersection of the orbital plane and the equatorial plane. Based on the above parameters, the altitude range of a satellite can be computed by: where H max and H min respectively refer to the maximum and minimum altitude of a satellite. R e denotes the radius of Earth. In this way, the variation of satellite altitude difference with eccentricity can be calculated when the value of semi-major axis is given, as shown in Figure 3b. It can be seen that only when eccentricity is small enough can the orbital altitude information be used for identification, so it is still necessary to investigate the eccentricity distribution of current observation satellites. UCS Satellite Database [44] has made detailed statistics of satellites currently orbiting Earth. Eccentricity and apogee altitude statistical results of observation satellites are shown in Figure 3c,d. It shows that the eccentricity of most satellites is lower than 0.01, especially for optical or radar imaging satellites. Therefore, orbital altitude information is an available and useful feature for recognizing observation satellites.
where H max and H min respectively refer to the maximum and minimum altitude of a satellite. R e denotes the radius of Earth. In this way, the variation of satellite altitude difference with eccentricity can be calculated when the value of semi-major axis is given, as shown in Figure 3b. It can be seen that only when eccentricity is small enough can the orbital altitude information be used for identification, so it is still necessary to investigate the eccentricity distribution of current observation satellites. UCS Satellite Database [44] has made detailed statistics of satellites currently orbiting Earth. Eccentricity and apogee altitude statistical results of observation satellites are shown in Figures 3c,d. It shows that the eccentricity of most satellites is lower than 0.01, especially for optical or radar imaging satellites. Therefore, orbital altitude information is an available and useful feature for recognizing observation satellites. For non-cooperative satellite targets, it is difficult to compute orbital altitude by orbital parameters because orbital parameters are not necessarily known or constant. However, it may become easy when radar is applied. For radar, distance measurement is a basic function and its observation angles are known. Satellite orbital altitude could be obtained by multiple coordinate transformation based on radar measurement data and the main coordinate transformation diagrams are shown in Figure 4. The detailed orbital altitude calculation process is shown in Appendix A.

Target-Attitude Sensitivity and Radar Observation Angular Domain Segmentation
Target-attitude sensitivity is one of the most difficult problems in radar HRRP target recognition research. According to the scattering center model [3,4,9,41,42], the variation of target attitude will lead to different range shifts for different scattering centers on the target, even within the attitude region where the scattering center structure remains unchanged (that is, without migration throuth resolution cells, MTRC). Specifically, for the HRRP of the m − th returned echo, suppose it is , then the echo power of the n − th range cell could be computed by: where * represents complex conjugate operation. σ ni denotes the intensity of the i − th scatterer in the n − th range cell. L n i=1 σ 2 ni is the conjugate product of the sub-echo for all scatterers in the n − th range cell, which represents the intensity sum of each scatterer and is relatively stable. However, cos[θ nik (m)] of the second item, which is called cross-term, will change with the variation of m, where θ nik (m) represents the phase difference of the i − th scatterer and the k − th scatterer of the n − th range cell in the m − th returned echo. Therefore, an HRRP, which is the amplitude of coherent sum of the complex returned echoes from scatterers in a range cell, can be changed substantially.
The target-attitude sensitivity problem makes it hard to recognize satellite targets base on HRRP data and needs to be focused on. It has been founded that average range profile [22] is helpful to improve attitude stability of HRRP, because the sum of the irrelevant echo power of the cross-terms will greatly weaken the effect of cross-items. As mentioned earlier, the correlation coefficient between average range profiles can be used to measure similarity of HRRPs [21]. For a set of HRRP samples x(0), x(1), · · · , x(M − 1) , which are translationally aligned and without MTRC, we represent them as M − 1, then its average range profile is [22] defined as: Research also suggests that utilizing radar observation angles information is beneficial, such as reducing search range and computation, relaxing target-attitude sensitivity and improving recognition rate [18,19].
Through the above analysis, for the target-attitude sensitivity problem of HRRP, we propose a hierarchical clustering method with a novel distance metric, namely the normalized angular distance divided by correlation coefficient (NADDCC), to segment radar observation angular domain. This distance metric includes both angular distance information and correlation coefficient between HRRP average range profiles, which could measure similarity between HRRP average range profiles better. For average range profiles µ i = [µ i (1), µ i (2), · · · , µ i (D)] whose observation azimuth and elevation angles are θ i and ε i , and µ j = µ j (1), µ j (2), · · · , µ j (D) whose observation azimuth and elevation angles are θ j and ε j , their angular distance d angle µ i , µ j and correlation coefficient ρ µ i , µ j are defined as follows: The larger the correlation coefficient and the smaller the angular distance, the higher the similarity of HRRP average range profiles. In addition, the distance metric applied for hierarchical clustering usually need to satisfy some necessary properties, such as non-negativity, identity and symmetry. In order to make the two distances work equally, they are normalized by means of dividing by their respective maximum. Taking into account the above considerations, the distance metric presented in this paper is designed as follows: Hierarchical clustering algorithm adopts bottom-up aggregation strategy. Firstly, each HRRP average range profile is regarded as an initial cluster, and then two nearest clusters are found and merged at each step of algorithm operation. The process is repeated until the number of preset clusters is reached. The clustering process can be summarized as follows:

Hierarchical Clustering Algorithm
Input: HRRP average range profile sample set S = µ 1 , µ 2 , · · · , µ m ; distance metric d avg C i , C j = 1 |C i ||C j | µ i ∈C i µ j ∈C j d µ i , µ j ; cluster number k. Process: for j = 1, 2, · · · , m do C j = µ j end for for i = 1, 2, · · · , m do for j = i + 1, · · · , m do M(i, j) = d avg C i , C j ; M(j, i) = M(i, j) end for end for set the number of current clusters: q = m while q > k do find the two nearest cluster C i * and C j * ; merge C i * and C j * : C i * = C i * ∪ C j * ; for j = j * +1, j * +2, · · · , q do renumber cluster C j as C j−1 end for delete line j * and column j * of matrix M for j = i + 1, · · · , q − 1 do M(i * , j) = d avg C i * , C j ; M(j, i * ) = M(i * , j) end for q = q − 1 end while Output: clusters C = {C 1 , C 2 , · · · , C k }

GRU-SVM Model
The designed GRU-SVM model is a combination of a GRU neural network as a deep feature extractor and SVM as a classifier. It makes the best of the advantages of the GRU deep neural network and SVM to extract the deep abstract features and complete an accurate classification. This model is described in detail below.

GRU
As mentioned above, GRU is designed in [37,45] to learn long term reliance on sequential data, and its overall performance is better than LSTM and simple conventional RNN [39,40]. Figure 5 shows a GRU model. There are only two gates in a GRU, namely the update gate z and the reset gate r. The update gate is utilized to modulate the previous information inside the unit. The larger the value of update gate, the more the status information of the previous moment insides. The reset door is used to control how much previous state information will be forgotten. The smaller the value of the reset gate, the more the previous state information is forgotten. The designed GRU-SVM model is a combination of a GRU neural network as a deep feature extractor and SVM as a classifier. It makes the best of the advantages of the GRU deep neural network and SVM to extract the deep abstract features and complete an accurate classification. This model is described in detail below.

GRU
As mentioned above, GRU is designed in [37,45] to learn long term reliance on sequential data, and its overall performance is better than LSTM and simple conventional RNN [39,40]. Figure 5 shows a GRU model. There are only two gates in a GRU, namely the update gate z and the reset gate r. The update gate is utilized to modulate the previous information inside the unit. The larger the value of update gate, the more the status information of the previous moment insides. The reset door is used to control how much previous state information will be forgotten. The smaller the value of the reset gate, the more the previous state information is forgotten. The update gaze z t and reset gate r t at the time t are defined as: where W and U are weight matrices. x denotes input data. The hidden state h t and candidate hidden state h t in GRU are calculated respectively as follows: where * represents element-wise product. The σ · and tanh · are two different activation functions which can be defined as: In this paper, GRU is employed in the GRU neural network to extract effective features based on HRRP sequential data.

SVM
The support vector machine (SVM) was developed by Vapnik [46] for binary classification. Its objective is to find the optimal hyper-plane f w,x = w · x + b to separate two classes in a given dataset, where x is the feature vector. SVM learns the parameters w and b by solving the following constrained optimization problem: The update gaze z t and reset gate r t at the time t are defined as: where W and U are weight matrices. x denotes input data. The hidden state h t and candidate hidden state h t in GRU are calculated respectively as follows: where * represents element-wise product. The σ(·) and tan h(·) are two different activation functions which can be defined as: 1 + e 2x (12) In this paper, GRU is employed in the GRU neural network to extract effective features based on HRRP sequential data.

SVM
The support vector machine (SVM) was developed by Vapnik [46] for binary classification. Its objective is to find the optimal hyper-plane f(w, x) = w · x + b to separate two classes in a given dataset, where x is the feature vector. SVM learns the parameters w and b by solving the following constrained optimization problem: where w T w is the Manhattan norm, C is the penalty parameter, and ξ is the cost function. The corresponding unconstrained optimization problem of Equation (18) is as follows: where y is the actual label, and w T x + b is the predictor function. This equation is known as L1-SVM, with the standard hinge loss. Its differential counterpart L2-SVM is given by the following equation: where w 2 is the Euclidean norm (also known as L2 norm), with the squared hinge loss. Despite being intended for binary classification, SVM may be used for multi-classification as well. One approach to achieve this is the use of kernel tricks, which convert a linear model into a non-linear model by applying kernel functions. However, we just use LSVM instead of utilizing kernel tricks in this paper, because LSVM does not employ any feature extraction and transformation and can serve as a simple baseline for evaluating the quality of extracted features. A one-vs-one scheme is employed to achieve multi-classification in this paper, which establishes the binomial classifier for every two classification.

GRU-SVM Model Construction
In this paper, a novel end-to-end GRU-SVM model has been designed and firstly applied to recognize targets based on radar HRRP data. The structure of GRU-SVM model is shown in Figure 6. The composition of this model includes two parts: GRU neural network as a feature extractor and LSVM as a classifier. GRU neural network is constituted of input layer, four hidden layers and output layer, where GRU hidden layer and fully connected layer (dense layer) are included. And the input layer, two GRU hidden layers and a fully connected layer make up the encoder module, whose output are defined as the features extracted by GRU neural network. In order to make the training model more accurate, we apply the bidirectional scheme demonstrated in reference [47] in GRU hidden layers (see Bid-GRU layer in Figure 6). In addition, two fully connected layers have been employed after the encoder to extract good features. And the last layer, also called output layer, would output the satellite classifications by adopting softmax activation function. That is, GRU neural network is trained in a supervised way. The reasons why we choose to apply the softmax classifier to train GRU neural network are that it's an excellent multi-class classifier and its common loss function, namely categorical cross-entropy loss, is more sensitive to classification output than the hinge loss of LSVM, which means that it is always optimizing the network parameters to reduce loss during training. Therefore, the utilization of softmax classifier may make the features extracted by the encoder more highly distinguishable. By training GRU neural network, the encoder will produce the length-fixed feature vectors which contain sufficient information for target recognition. LSVM classifier takes feature vectors as input and produce classification results. It mainly has two roles: one the one hand, LSVM classifier could have outstanding generalization performance for the testing data after it is trained with the extracted features and their corresponding labels; on the other hand, LSVM classifier is applied as a simple baseline for evaluating the quality of features extracted by different methods here, because it does not employ any feature extraction and transformation. However, the softmax classifier in this GRU neural network may be not suitable for comparing the quality of extracted features on account of the existence of nonlinear activation function. Linear ϕ(·), relu φ(·) and softmax ψ(·) activation functions are employed in this model and defined as follows: h  After the analysis and description of Section 2 and Section 3, the overall framework of this proposed satellite target recognition method is shown in Figure 7. It can be divided into three parts, namely training process, testing process and the methods and techniques used therein. Orbital altitude calculation and observation angular domain segmentation are applied to divide HRRP data, including training samples and testing samples. The detailed orbital altitude calculation process is shown in the Appendix A. On this foundation, GRU neural network is trained to get the deep abstract features based on divided HRRP training data. Then, the classification results of testing data could be obtained by the trained GRU neural network and LSVM.  After the analysis and description of Sections 2 and 3, the overall framework of this proposed satellite target recognition method is shown in Figure 7. It can be divided into three parts, namely training process, testing process and the methods and techniques used therein. Orbital altitude calculation and observation angular domain segmentation are applied to divide HRRP data, including training samples and testing samples. The detailed orbital altitude calculation process is shown in the Appendix A. On this foundation, GRU neural network is trained to get the deep abstract features based on divided HRRP training data. Then, the classification results of testing data could be obtained by the trained GRU neural network and LSVM.
namely training process, testing process and the methods and techniques used therein. Orbital altitude calculation and observation angular domain segmentation are applied to divide HRRP data, including training samples and testing samples. The detailed orbital altitude calculation process is shown in the Appendix A. On this foundation, GRU neural network is trained to get the deep abstract features based on divided HRRP training data. Then, the classification results of testing data could be obtained by the trained GRU neural network and LSVM.

Experimental Results and Discussion
In this section, test experiments will be carried out to obtain the performance of the proposed recognition method. After dividing training data and completing the training process of GRU-SVM model, the recognition accuracy of testing data set will be gotten according to the testing process in Figure 7. Furthermore, performance testing and comparison experiments of different conditions and recognition methods also have been done to better illustrate the advantages of this recognition method.

Data Generation and Partition
Considering the difficulty of obtaining satellite HRRP measured data, we utilize reliable simulation radar HRRP data from ten satellites that simulated by an X-band radar with a center frequency of 10 GHZ and a bandwidth of 1 GHz. The main parameters of radar and these satellites are listed in Table 2 and the detailed flow chart of radar data generation is shown in Figure 8. The observational relationship between satellite and radar is calculated based on their parameters. Then, radar cross section (RCS) and echo are computed and radar data is obtained. In our experiments, each satellite target has 70,000 HRRP samples and each HRPP is a 300-dimensional vector. The data preprocessing operation has been done as described in Section 2.1. 80% of radar data will be applied as training set and others as testing set. For training set, data partition would be done by orbital altitude calculation and observation angular domain segmentation. Figure 9 shows the data partition results. The orbital altitude of this ten satellites is divided into three ranges in this paper, namely H Sat ≤ 500km, 500 < H Sat ≤ 1000km and H Sat > 1000km. Meanwhile, a hierarchical clustering method with a novel NADDCC distance metric has been implemented for HRRP data in each orbital altitude range to get clustering datasets. It should be noted that a set of radar observation angles is shared by continuous 700 radar HRRP data. Thus, we could be decide which cluster the test data belongs to when determining its altitude range and angles cluster.

Training Assessment of GRU Neural Network
After data partition, GRU neural network could be trained with these clustering datasets as input. In order to train the network faster and more accurately, we apply the following deep neural network training techniques in this paper: (1) 20% of training data is used as validation set to adjust the hyper-parameters; (2) Drop-out is employed for the two GRU layers to avoid the problem of overfitting and set to 0.25; (3) Batch normalization is inserted after each layer to accelerate the training.
For multi-classification problems, recognition accuracy, categorical cross-entropy loss Loss CC and mean absolute error (MAE) loss Loss MAE are often applied to assess classification result. They are defined as follows: where M is the total number of current training samples and N is the number of the samples predicted correctly.ŷ i denotes the prediction value of the ith sample and y i denotes expected value. m represents the class number and is usually greater than or equal to 3. In order to reduce the loss and improve accuracy in the training process, it is necessary to choose a suitable optimizer for GRU neural network. Adam is an excellent optimizer which combines the main advantages of the previous deep learning optimizers AdaGard and RMSProp. Thus, the Adam optimizer is used with the initial learning rate 1 × 10 −3 . When evaluation indicator is not improving, the learning rate will decrease in multiple.
The training records of 100 epochs are shown in Figure 10. It could be found that the training and validation accuracy raises with the increase of training epoch and converges to a high accuracy value. Correspondingly, the categorical cross-entropy loss and MAE of training and validation data decrease with the increase of training epoch. These results all confirm that GRU neural network is well trained for all training clustering datasets.
previous deep learning optimizers AdaGard and RMSProp. Thus, the Adam optimizer is used with the initial learning rate 1×10 -3 . When evaluation indicator is not improving, the learning rate will decrease in multiple. The training records of 100 epochs are shown in Figure 10. It could be found that the training and validation accuracy raises with the increase of training epoch and converges to a high accuracy value. Correspondingly, the categorical cross-entropy loss and MAE of training and validation data decrease with the increase of training epoch. These results all confirm that GRU neural network is well trained for all training clustering datasets.

Recognition Results and Comparative Analysis
The performance of this recognition method could be obtained on the basis of data partition and GRU neural network training, and a full comparative analysis of the recognition results under different conditions and recognition methods is also made in this section.

Classification Results of this Recognition Method
After completing radar data partition and GRU neural network training, a series of trained neural network models can be obtained based on training datasets of different clusters. According to the overall recognition framework in Figure 7, the orbital altitude calculation and observation angles assignment of testing data will be carried out, which is for determining which training model to extract their deep abstract features. Then, classification testing would be implemented by LSVM based on these features and corresponding recognition results of testing data could be gotten. The confusion matrices of these data tested by all training models are shown in Figure 11 and the corresponding recognition accuracy are listed in Table 3. It can be seen that testing data could be correctly divided into its corresponding trained model by orbital altitude calculation and observation

Recognition Results and Comparative Analysis
The performance of this recognition method could be obtained on the basis of data partition and GRU neural network training, and a full comparative analysis of the recognition results under different conditions and recognition methods is also made in this section.

Classification Results of this Recognition Method
After completing radar data partition and GRU neural network training, a series of trained neural network models can be obtained based on training datasets of different clusters. According to the overall recognition framework in Figure 7, the orbital altitude calculation and observation angles assignment of testing data will be carried out, which is for determining which training model to extract their deep abstract features. Then, classification testing would be implemented by LSVM based on these features and corresponding recognition results of testing data could be gotten. The confusion matrices of these data tested by all training models are shown in Figure 11 and the corresponding recognition accuracy are listed in Table 3. It can be seen that testing data could be correctly divided into its corresponding trained model by orbital altitude calculation and observation angles assignment. Meanwhile, these ten satellites can be well recognized and achieve a total accuracy of 99.2%.    In order to demonstrate the effect and advantages of this proposed recognition method, recognition performance for testing data is compared when different methods are applied to recongnize satellites under different conditions. Four deep neural networks, namely GRU neural network, CNN, autoencoder (AE) and denoising autoencoder (DAE), and three shallow models, including PCA, dictionary learning (DL) and manifold learning (ML), serve as feature extractor and LSVM is used to classify these satellites based on these extracted features. To ensure fairness in performance comparison, the number of some layers in these neural networks should be as same as possible, such as GRU hidden layer, CNN layer and encoder/decoder layer; These seven feature extraction methods could all reduce the 300-dimensional HRRP samples to same dimension, for example 64 dimensions in this paper. Table 5 shows the detailed recognition accuracy of these seven methods under various conditions and the corresponding statistical comparable results are shown in Figure 12. We can make the following conclusions from these two charts: (1) Orbital altitude calculation and observation angles clustering technology are favourable for improving the recognition rate of satellites for all seven methods, which verifies the validity of radar data partition; (2) Compared with the latter six methods, GRU-SVM model has good recognition performance for these ten satellites. Therefore, its total recognition accuracy rate is almost highest among these seven methods no matter whether orbital altitude calculation or observation angles clustering is applied.
Although the classification results of LSVM could prove the feature extraction validity of GRU neural network in some aspects, it is still expected to further display the distribution of extracted features for these seven recognition methods. Therefore, dimension reduction visualization technology is employed in this paper, which can map high dimensional feature data to two or three dimensions. At this time, the distribution of extracted features can be seen intuitively. Figure 13 shows dimension reduction distribution of features extracted based on one cluster training data (H Sat > 1000km, Cluster = No.3) for these seven methods. It can be found in Figure 13a that the GRU neural network has the best feature distribution result because the dimension reduction features of three satellites are separated from each other. However, other methods have more or less intersections between different satellite classes, especially the last five methods. It is confirmed that the features extracted by the GRU neural network are more effective and highly discriminative.    1 The first number represents whether to use orbital altitude calculation and the second represents whether to utilize observation angles clustering, for example '10′ denotes orbital altitude calculation has been applied but clustering is not applied.  1 The first number represents whether to use orbital altitude calculation and the second represents whether to utilize observation angles clustering, for example '10 denotes orbital altitude calculation has been applied but clustering is not applied.

Conclusions
In this paper, a novel satellite target recognition method based on radar data is proposed. It mainly includes two modules of data partition and a deep classification model. Satellite orbital altitude is introduced as a distinct and accessible feature because it is easy to calculate based on radar data and has relative stability. A hierarchical clustering method based on a new NADDCC distance metric is utilized to segment the radar observation angular domain. These two technologies can effectively complete data partition. Then, a GRU-SVM model is designed for radar HRRP satellite recognition, which is comprised of a deep GRU neural network as a feature extractor and LSVM as a classifier. The GRU neural network training records and feature visualization results all confirm that this GRU neural network could extract more deep and abstract features and these features have better separability. Furthermore, performance testing and comparison experiments also demonstrate that GRU neural network possesses better comprehensive performance for HRRP target recognition than LSTM neural network and conventional RNN; data partition can improve the recognition rate of satellites, and the recognition performance of our satellite target recognition method is almost better than that of other several common feature extraction methods or no data partition.
Author Contributions: W.L. proposed this recognition framework for radar satellite, performed the experiment and completed this manuscript. Y.Z. and C.L. revised this manuscript. C.X. and Y.H. provided helpful advice.

Acknowledgments:
The work in this paper has been supported by the National Natural Science Foundation of China (61304228).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Satellite Orbital Altitude Calculation Method
Satellite orbital altitude could be obtained by multiple coordinate transformation based on radar measurement data. The Figure 4 shows these coordinate transformation diagrams. It mainly involves the following coordinate systems: (1) Geocentric Cartesian coordinate system: its origin O E is situated at the center of the Earth. O E X E axis is located in the equatorial plane and points to the meridian at the Greenwich Observatory. O E Z E axis is perpendicular to the equatorial plane, coincident with Earth's rotation axis and points to the north pole. O E Y E axis is located in the equatorial plane and its direction satisfies the right-handed rectangular coordinate system criterion. (2) Radar body Cartesian coordinate system: its origin S is situated at the radar station. SX S axis is located in the ground plane of the radar station and points to the south. SY S axis is located in the ground plane of the radar station and points to the east. SZ S axis passes the radar station, along its plumb line and points to the zenith.
The detailed coordinate transformation process is as follows: (a) Conversion of radar polar coordinate system into radar body Cartesian coordinate system Set the coordinates of satellite target in the radar polar coordinate system to (ρ, θ, ε), where ρ is the slant distance from satellite to radar station. θ, ε represent the radar observation azimuth and elevation angle, respectively. These parameters could be obtained in the process of radar measurement. Then, the coordinates x 1 , y 1 , z 1 of the satellite in the radar body Cartesian coordinate system could be calculated by the following formula: x 1 = ρ·cos θ·cos ε y 1 = ρ·sin θ·cos ε z 1 = ρ·sin ε (A1) (b) Conversion of geodetic coordinate system to geocentric Cartesian coordinate system Set the geodetic coordinates of the radar station are (λ S , ϕ S , H), where λ S , ϕ S , H represent the longitude, latitude and altitude of the radar station, respectively, which are known for certain radar station. Its corresponding coordinates (x , y , z ) in the geocentric Cartesian coordinate system can be computed by the following formula: x = (N + H)·cos λ S ·cos ϕ S y = (N + H)·sin λ S ·cos ϕ S z = N· 1 − e 2 +H ·sin ϕ S (A2) where N denotes the radius of prime vertical and is calculated by N = a/ 1 − e 2 sin 2 ϕ S . a, e represent the semi-major axis and oblateness of earth.
(c) Conversion of radar body Cartesian coordinate system to geocentric Cartesian coordinate system Set the coordinates of satellite in the radar body Cartesian coordinates system are x 1 , y 1 , z 1 , suppose its corresponding coordinates in the geocentric Cartesian coordinates system are x e , y e , z e , then the conversion formula is as: where R T is rotation matrix and calculated as: sin ϕ S cos λ S − sin λ S cos ϕ S cos λ S sin ϕ S sin λ S cos λ S cos ϕ S sin λ S − cos ϕ S 0 sin ϕ S Once the coordinates of satellite in the geocentric Cartesian coordinate system are known, it is easy to calculate the altitude of the satellite. The calculation formula is as follows: H sat = (x e ) 2 + y e 2 + (z e ) 2 − R e (A5)