SAR Target Classiﬁcation Based on Sample Spectral Regularization

: Synthetic Aperture Radar (SAR) target classiﬁcation is an important branch of SAR image interpretation. The deep learning based SAR target classiﬁcation algorithms have made remarkable achievements. But the acquisition and annotation of SAR target images are time-consuming and laborious, and it is difﬁcult to obtain sufﬁcient training data in many cases. The insufﬁcient training data can make deep learning based models suffering from over-ﬁtting, which will severely limit their wide application in SAR target classiﬁcation. Motivated by the above problem, this paper employs transfer-learning to transfer the prior knowledge learned from a simulated SAR dataset to a real SAR dataset. To overcome the sample restriction problem caused by the poor feature discriminability for real SAR data. A simple and effective sample spectral regularization method is proposed, which can regularize the singular values of each SAR image feature to improve the feature discriminability. Based on the proposed regularization method, we design a transfer-learning pipeline to leverage the simulated SAR data as well as acquire better feature discriminability. The experimental results indicate that the proposed method is feasible for the sample restriction problem in SAR target classiﬁcation. Furthermore, the proposed method can improve the classiﬁcation accuracy when relatively sufﬁcient training data is available, and it can be plugged into any convolutional neural network (CNN) based SAR classiﬁcation models.


Introduction
Synthetic Aperture Radar (SAR) is an important earth observation system with all-day and all-weather capability, and it has been widely used in both the military and civil fields. However, due to the complicated characteristics of SAR images, human recognition of SAR targets is difficult and inefficient. Therefore, automatic target recognition (ATR) of SAR has become a very important research direction and attracted wide attention. A standard SAR ATR process can be divided into three steps: detection, discrimination and classification. The first two steps intend to extract potential target areas and remove false alarms. The third step is to use a classifier to distinguish each SAR target. In this paper, we mainly focus on the third stage of the SAR target classification.
There are many researches on SAR target classification, which can be mainly divided into three categories: template-based methods [1][2][3], model-based methods [4][5][6][7] and machine learning based Inspired by the above researches, we use transfer-learning to solve the sample restriction problem in SAR target classification. A standard pipeline is pre-training the model on a simulated SAR dataset with sufficient training data, then fine-tuning the pre-trained model with limited real SAR data. But the low feature discriminability for real SAR data may further challenge the performance, especially with the limited training data. The relation of the feature transferability and discriminability is explored in [34]. The eigenvectors of feature representations with large singular values dominate the feature transferability, while the eigenvectors with small singular values can provide extra feature discriminability. During training, the classifier relies on the salient features with large singular values as they dominate the feature representations. The features with large singular values will be strengthened and the features with small singular values will be suppressed until the model enters the saturation area of softmax activation, resulting in loss of feature discriminability from the features with small singular values.
Therefore, we propose the spectral regularization of the feature representations to improve the feature discriminability by reducing the difference between the large and small singular values and combine it with the standard transfer-learning pipeline, as shown in Figure 2. Concretely, the first proposed regularization is named sample spectral regularization (SSR), which suppresses the large singular values for the feature of each sample. The second proposed regularization named SSR † , which explicitly encourages more feature discriminability by narrowing the gap between the large and small singular values. Employing spectral regularization at sample-level, i.e., regularization for the feature of each training sample, can implement better performance than [34] at batch-level. Except for the sample restriction problem, the proposed spectral regularization can also improve the classification accuracy when sufficient training data is available.
Our contributions are three folds: • We propose SSR and SSR † to improve the feature discriminability by reducing the difference between the large and small singular values. The proposed SSR is at sample-level and can implement better feature discriminability than that at batch-level, which makes the classifier easier to recognize the targets of different classes. • Based on the proposed regularization method, we propose a transfer-learning pipeline to solve the sample restriction problem in SAR target classification, which can leverage the prior knowledge from the simulated SAR data as well as has better feature discriminability. • We further investigate the difference of various spectral regularizations. The experimental results indicate that reducing the difference between the large and small singular values at sample-level is best effective. Besides, we analyze the impact of spectral regularization on singular values.
The remainder of this paper is organized as follows. The details of the proposed method are described in Section 2. In Section 3, we conduct experiments to prove the effectiveness of the proposed method in the limited-data and sufficient-data regime and analyze the experimental results. Section 4 concludes this paper.

Method
In this section, we formulate the sample restriction problem for SAR target classification and address it by the proposed sample spectral regularization (SSR).

Problem Setting
The goal of this paper is to solve the sample restriction problem for SAR target classification. Let S = {(x s i , y s i )} denote a source dataset consisting of N s labeled SAR target samples from C s different classes, and T = {(x t i , y t i )} denote a target dataset consisting of N t labeled SAR target samples from C t different classes. Herein, S contains sufficient training samples, e.g., at least several hundreds of samples per class, T only contains a small number of training samples, e.g., only several dozens of samples per class. We employ a transfer-learning approach to transfer prior knowledge from a simulated SAR dataset S to a real SAR data set T .

Feature Discriminability
As a key criterion to measure the representational capacity, discriminability refers to whether the model can recognize different SAR target categories successfully. Here, we revisit the relation between the feature discriminability and the singular values of the feature matrix. According to [34], the feature discriminability criterion can be formulated as follows: D is a matrix for dimension reduction. D * is the optimal solution of Equation (1) and can be formulated as: S inter and S intra denote the inter-class variance and intra-class variance, and can be calculated as follows: For S inter , there are c target classes and each class has n j examples. For S intra , there are c target classes and f is the extracted deep feature. µ j denotes the center feature of j-th class. µ denotes the center feature of all classes. F j denotes all features in j-th class. The optimal solution D * of the feature discriminability criterion can be calculated by the Singular Value Decomposition (SVD) as follows: Thus, D * = U. The larger discriminability criterion indicates higher classification accuracy and vice versa. In the standard transfer-learning pipeline, the deep learning based classification model is pre-trained on the source dataset, then is fine-tuned on the target dataset using the pre-trained parameters as an initialization. During both phases, the model is commonly optimized by the classification loss function, e.g., the cross entropy loss. While training the model via only minimizing the classification loss cannot guarantee that the discriminability converges to the optimal solution well [34]. Therefore, we need to explicitly optimize the feature discriminability to improve the performance of the SAR target classification model training with insufficient data.
The relation between the feature discriminability and the singular values of the feature matrix is investigated in [34,35]. The eigenvectors of the feature matrix corresponding to larger singular values represent the portion of features with better transferability. The information in the eigenvectors of the feature matrix corresponding to small singular values is beneficial to improving the feature discriminability. The sharper distribution of singular values can degrade the feature discriminability. That is, strengthening the large singular values can improve the feature transferability. Strengthening the small singular values can improve the feature discriminability.
In this paper, we transfer the prior knowledge from the simulated SAR dataset S to real SAR dataset T . The difference between S and T is small. Hence, the feature transferability can be guaranteed. While the discriminative information from the features corresponding to small singular values may be weakened. Consequently, we should suppress the large singular values and strengthen the small singular values to improve the feature discriminability.

Sample Spectral Regularization
A SAR target classification model consisting of a feature extractor M and a classifier φ. The feature extractor M takes a batch of SAR target images as input and outputs the extracted feature matrix F = [ f 1 , f 2 , ..., f b ]. b is the size of a batch. Then the extracted features are sent to the classifier φ to output class probabilities for each sample.
A simple approach to improve the feature discriminability is suppressing the largest singular values so that the eigenvectors with small singular values can be relatively strengthened, which is named Batch Spectral Penalization (BSP) and applied in [34]. To implement BSP, the singular values of the feature matrix F are computed by singular value decomposition (SVD) as follows: And Σ can be formulated as: Then BSP can be formulated as: where η is a trade-off hyperparameter, σ i is the i-th singular value in the diagonal of singular value matrix Σ. [34] only suppresses the largest singular value, i.e., k is set to 1. BSP can be combined with the classification loss L cls to train the model as follows: L cls can be formulated as: l is the cross entropy loss. Although BSP can improve the feature discriminability, it is conducted at batch-level and its performance can be affected by batch size, especially for the sample restriction problem where large batch size may not be available. We propose sample spectral regularization (SSR) and regularize the singular values at sample-level to bypass the above limitation of BSP. For each input SAR target example, the shape of the extracted feature f is (W, H, C). W and H denote the width and height of f , and C is the number of channel. We reshape the feature f to (W × H, C) so that the singular values of f can be computed by SVD as follows: Here, the singular value matrix * Σ is: where τ = min(W × H, C). Let * σ i denote the set of singular values in the diagonal of singular value matrix * Σ for i-th sample within a batch, that is, * The sample-level SSR can be formulated as: Besides, we further improve SSR and suppress the large singular values and strengthen the small singular values by directly narrowing the gap between the largest and smallest singular values. This regularization is named SSR † and can be formulated as: is to control the extent of the regularization constraints.

Transfer Learning with Sample Spectral Regularization
We combine the proposed SSR with the standard transfer-learning pipeline to solve the sample restriction problem of SAR target classification. Considering that generating SAR target examples by the simulator is easier than collecting real SAR target examples, we use sufficient simulated SAR data to solve the sample restriction problem, which is feasible in practice. As shown in Figure 2, the whole pipeline consists of pre-training and fine-tuning phases. During pre-training, the feature extractor M and the classifier φ are trained on the labeled simulated SAR data. During fine-tuning, M is initialized with the pre-trained parameters and φ is initialized randomly. Therefore, we can employ the proposed SSR during the pre-training phase to get better-initialized parameters and employ SSR during the fine-tuning phase to improve the performance of the model. The model architecture with SSR or SSR † is shown in Figure 3. Pre-train with SSR. We pre-train the model with SSR or SSR † in the simulated SAR dataset S. The loss function can be formulated as: The feature extractor M is optimized by L cls and the proposed regularization together. The classifier is optimized by L cls .
The pre-trained feature extractor M pre can output features with better discriminability by optimizing with the proposed SSR. This provides a good initialization for the next fine-tuning phase and makes the model converge better.
Fine-tune with SSR. We fine-tune the pre-trained model with SSR or SSR † in the target SAR dataset T . The loss function is same as in Equation (14). With a good initialization and the proposed regularization, the features from fine-tuned feature extractor M f t have better discriminability so that the model can yield better performance even a few training data is available.
In summary, the model is pre-trained and fine-tuned by the loss function: or by the following loss function: For example, the detailed process with SSR is outlined in Algorithms 1 and 2.
Update M and φ with gradient descent: Update M and φ with gradient descent:

An Intuitive Understanding
A convolutional neural network (CNN) based SAR target classification model commonly consists of a feature extractor and a classifier. The feature extractor is responsible for generating features that have better discriminability. Then the extracted features are used for classification. Without spectral regularization, the feature extractor M and the classifier φ are updated with gradient descent as follows: ∇ is nabla operator. During training, the classifier relies on the salient features with large singular values as they dominate the feature representations. As a result, the features with large singular values will be strengthened continuously and the features with small singular values will be suppressed continuously, which makes the model very confident for target classification and enter the saturation area of softmax activation, that is, the gradient magnitude ∇L cls is small. However, the features with large singular values may not always provide enough and correct information for target classification, especially with limited training data. Therefore, we need to make the model capture as much useful information as possible. The proposed spectral regularization method aims to strengthen the features with small singular values. With the proposed spectral regularization, the model is updated as follows: When the model relies on the salient features with large singular values heavily, the gradient magnitude from the spectral regularization ∇SSR will be large, which can make the model leave the saturation area of softmax activation and force the classifier to use more features for classification instead of only the salient features with large singular values. Obviously, the proposed method can improve the feature discriminability and implement better SAR target classification results.

Implementation Details
We employ the same network architecture as in [33] for a fair comparison. The details of network configurations are shown in    Table 1.  The simulated SAR dataset. The simulated SAR dataset is devised by [36]. The simulated SAR images are collected using a simulation software according to the CAD models of the targets. The simulation software parameter values, e.g., material reflection coefficients and background variation, were set according to the imaging parameters of the MSTAR dataset so that the appearance of the simulated images is close to the real SAR images. There are fourteen target classes from seven types of targets due to each target type with two different CAD models. Some SAR images of the simulated SAR dataset are shown in Figure 6. The details of the simulated SAR dataset are shown in Table 2.

Training Details
During the pre-training stage, the model is trained for 400 epochs with an SGD optimizer. The learning rate is 0.001, the momentum rate is 0.9 and the weight decay is 0.0005. During the fine-tuning stage, the model is trained for 200 epochs with an SGD optimizer and the learning rate is 0.01. The hyperparameter η is tuned through cross validation. η is 0.01 and 0.1 for SSR and SSR † respectively. η in BSP is set to 0.001. in SSR † is set to zero. k in BSP and SSR is set to 1, that is, only the largest singular value is constrained.

SSR with Limited Training Data
To solve the sample restriction problem, we propose SSR and combine it with the transfer-learning pipeline. The proposed SSR suppresses large singular values and strengthens small singular values to improve the feature discriminability, which can make the deep learning based SAR classification model converge well with limited training data. In this section, we first evaluate the performance of different spectral regularization methods, then compare our best spectral regularization solution with baselines.
The performance of SSR. We evaluate the performance of BSP, SSR and SSR † and the experimental results are shown in Table 3. The simulated SAR dataset is used for pre-training, then we fine-tune the model using 10% of the real SAR images at a 17 • depression angle. All of the real SAR images at a 15 • depression angle are used for test. The details of the pre-training, fine-tuning and testing data are shown in Tables 1 and 2. Table 3. Test accuracies of different methods with or without spectral regularization. × denotes no any spectral regularizations. Note that the model is just trained on the limited real SAR data without the pre-training stage for ID 1, 3, 4, and 5. Firstly, BSP, SSR and SSR † are integrated into a standard training pipeline, where the model is trained from scratch (ID 3, 4, 5 in Table 3). With the three spectral regularizations, the three trained models can yield slightly higher classification accuracies than the model (ID 1 in Table 3) without spectral regularization respectively. Secondly, we initialize the models using the pre-trained parameters. then fine-tune the models with BSP, SSR and SSR † respectively (ID 6, 7, 8 in Table 3). Based on the same pre-trained model, fine-tuning the model with the spectral regularizations can achieve better classification results than without spectral regularization (ID 2 in Table 3), especially SSR † bringing in a 4.3% relative classification accuracy improvement. And the models with the proposed SSR and SSR † significantly outperform that with BSP. Thirdly, we pre-train the models with BSP, SSR and SSR † respectively, then fine-tune the models without spectral regularization to investigate how much improvements are brought by the three spectral regularizations (ID 9, 10, 11 in Table 3).

ID
Obviously, BSP, SSR and SSR † can provide better-initialized parameters than pre-training without spectral regularization.
The above experimental results indicate that applying spectra regularizations to pre-training and fine-tuning can improve the final classification accuracies in the limited-data regime. SSR † is the best regularization method for pre-training and fine-tuning, thus we combine SSR † with the transfer-learning as our solution for the sample restriction problem (ID 12 in Table 3), which can yield the best classification accuracy of 88.8%.
Classification results Under SOC. In the limited-data regime, the model is evaluated under the standard operating condition (SOC). After pre-training with the simulated SAR images, the feature extractor M is initialized with the pre-trained parameters and the model is fine-tuned using 10% of the real SAR images at a 17 • depression angle. All of the real SAR images at a 15 • depression angle are used for test. The baseline methods are the original CNN (CNN_ORG), CNN with transfer-learning (CNN_TF) [26], CNN with parameter prediction (CNN_PP) [37], the cross-domain and cross-task method (CDCT) [32] and the probabilistic meta-learning method (PML) [33].
The experimental results are shown in Table 4. CNN_ORG yields the lowest classification accuracy of 75.5%. From this we can see that training the model using limited SAR data is a challenge. The standard transfer-learning can significantly improve the classification accuracy by using the prior knowledge learned from the sufficient simulated SAR images. CNN_PP, CDCT and PML achieve better classification results based on the carefully designed pipeline and model framework. In contrast, our method is very simple and effective, which achieves a comparable accuracy of 88.8% to CNN_PP, CDCT and PML.
Besides, Figure 7 illustrates the detailed classification accuracies with different proportions of training data. When increasing the proportion of training data used, the classification accuracies of all methods will become higher. Our method can achieve comparable or better performances for different proportions of training data used.

Classification results Under Depression Variations.
In the limited-data regime, the model is evaluated at different depression angles. During the pre-training stage, we train the model on the simulated SAR dataset. Then, the pre-trained parameters are used to initialize the feature extractor M. We select 3 target classes from the real SAR dataset to fine-tune and test the model. Table 5 shows the details of the training and test data on the real SAR dataset. The model is fine-tuned on 10% of the real SAR images at a 17 • depression angle. During test time, the model is evaluated with images at 30 • and 45 • depression angles.
The experimental results are shown in Table 6. For the 30 • depression angle, our model achieves the best classification accuracy of 91.0%. When the testing depression angle increases from 30 • to 45 • , our model still yields a competitive classification result.
In summary, the above experiments prove that the proposed spectral regularization SSR † is a feasible way to solve the sample restriction problem for SAR target classification. Although SSR † is simple, it performs well with limited training data under SOC and depression variations.

SSR with Sufficient Training Data
In this section, we investigate how much improvement is brought by the proposed spectral regularization when fine-tuning the model with sufficient data.
Classification results Under SOC. The comparison experiments is conducted under the standard operating condition (SOC). The whole pipeline is the same as above. We pre-train the model with the simulated SAR images and initialize the feature extractor M using the pre-trained parameters. Then the model is fine-tuned using all of the real SAR images at a 17 • depression angle. We perform evaluation on the real SAR images at a 15 • depression angle. The baseline methods are CNN_ORG, CNN_TF CDCT, PML, KSR [38] and TJSR [39], CDSPP [40], KRLDP [41], MCNN [42] and MFCNN [43]. CNN_ORG and CNN_TF are selected as the simplest baselines. KSR and TJSR are two sparse representation methods, CDSPP and KRLDP are two discriminant projection methods. MCNN and MFCNN are based on convolutional neural networks (CNN).
The experimental results are shown in Table 7. Our method can achieve a comparable classification result to CDCT, MCNN and PML. With the proposed spectral regularization, the deep learning based feature extractor can acquire better feature discriminability, providing a 1.5 point boost in classification accuracy.
Three target classes from the real SAR dataset are selected to fine-tune and test the model. The details of the training and test data on the real SAR dataset are shown in Table 5. All of the real SAR images at a 17 • depression angle are used to fine-tune the model. The real SAR images at 30 • and 45 • depression angles are used for evaluation.
The experimental results are shown in Table 8. Our model achieves the best classification accuracy of 98.9% at a 30 • depression angle, which brings in a 4.9 point boost in classification accuracy for CNN_TF. For 45 • depression angle, the classification accuracies of all methods degrade dramatically. It should be noted that the classification accuracy of CNN_TF decreases when the number of training data changes from 10% to 100%. That's likely because the standard transfer learning cannot work well when the training images are very different from the test images [32]. While our model yields a comparable classification result to TJSR and provides a 6.7 point boost in classification accuracy for CNN_TF.
In summary, the above experiments prove that the proposed spectral regularization SSR † can also promote the learning of the classification model with sufficient data. In the sufficient-data regime, cooperating with SSR † , the performance of CNN_TF can be significantly improved under SOC and depression variations.

SVD Analysis
In this section, we analyze the effectiveness of the proposed SSR and SSR † . With the pre-trained parameters as initialization, the model is fine-tuned on the real SAR images at 17 • depression angle. And the max-normalized singular values in different epochs are visualized.
Firstly, we visualize the batch-level singular values of the feature matrix F = [ f 1 , f 2 , ..., f b ], which is produced from a batch of SAR image features. As shown in Figure 8, the difference between the large and small singular values is very big in the first epoch. As training progresses, the difference between the large and small singular values is decreased gradually. It is obvious that BSP makes the small singular values have more dominance than CNN_TF without spectra regularization. That is, the model with BSP has better feature discriminability. As a consequence, BSP can provide a 2.3 point boost in classification accuracy for CNN_TF. Secondly, we visualize the singular values of the sample-level feature matrix f , which is generated by reshaping the extracted SAR image feature from (W, H, C) to (W × H, C). Figure 9 shows the max-normalized sample-level singular values in different epochs. Both SSR and SSR † can strengthen the small singular values to improve the feature discriminability. Obviously, SSR † implements a significantly smaller difference between the large and small singular values than SSR. This is because SSR † is designed to reduce the difference between the large and small singular values directly. As a consequence, SSR † can provide a 4.3 point boost in classification accuracy for CNN_TF. Although both BSP and SSR improve the feature discriminability by suppressing the largest singular values, the sample-level spectral regularization SSR performs better than the batch-level spectral regularization BSP. We think this difference comes from the level of spectral regularization. The sample-level regularization can improve the feature discriminability for each SAR image feature precisely. While the batch-level spectral regularization works for a batch of SAR image features and every image feature can affect each other. Therefore, using spectral regularization at the sample-level is more effective than batch-level (3.2% boost of SSR vs. 2.3 % of BSP).
For the original transfer-learning method, the difference between large and small singular values is always very large along with the training, and the large singular values are dominant. In contrast, for the proposed methods SSR and SSR †, the difference of large and small singular values is reduced along with the training, and the dominant position of large singular values is weakened, or small singular values are strengthened. Strengthening small singular values can make the model use more diverse discriminative information for classification and generalize well with limited training samples.
In summary, our best spectral regularization SSR † directly reduces the difference of the large and small singular values at the sample-level. The effectiveness of SSR † is proved by the above experiments. Applying SSR † into the pre-training and fine-tuning stages can achieve better results. Besides, SSR † can be plugged into any CNN based SAR target classification models to achieve performance gains no matter the training data is sufficient or not.

Noise Robustness
The small singular values strengthened by our method may contain some noise, which can degrade classification accuracy. In this paper, we combine classification loss with SSR or SSR+ and use a trade-off hyperparameter η to balance these two terms and the classification loss is dominant. This setting can guarantee that the noise affecting discrimination will be suppressed. Therefore, the proposed method is noise-robust. This is proved by the above experiments, where our method can implement competitive classification results.

Conclusions
It's difficult to train CNN based models with limited data in SAR target classification. To solve this sample restriction problem, we propose the sample spectral regularization, which can regularize the singular values of each SAR image feature to improve the feature discriminability. The proposed SSR method has been integrated into a transfer learning framework to maximize its potential performance. The experimental results indicate that the proposed regularization is a feasible approach for the sample restriction problem in SAR target classification, and conducting spectral regularization at the sample-level is better than batch-level. Besides, the proposed method can improve the classification accuracy as well when the training data is sufficient. It should be noted that our simple and effective spectral regularization can be plugged into any CNN based SAR classification models besides the implemented transfer learning framework, which is expected to benefit numerous researchers in this area.
Author Contributions: W.L. and T.Z. conceived and designed the experiments; W.L. and T.Z. performed the experiments and analyzed the data; W.L. and T.Z. wrote the paper; W.D., X.S. and L.Z. contributed materials; and K.F. and Y.W. supervised the study and reviewed this paper. All authors have read and agreed to the published version of the manuscript.