A Comprehensive Survey on SAR ATR in Deep-Learning Era

: Due to the advantages of Synthetic Aperture Radar (SAR), the study of Automatic Target Recognition (ATR) has become a hot topic. Deep learning, especially in the case of a Convolutional Neural Network (CNN), works in an end-to-end way and has powerful feature-extracting abilities. Thus, researchers in SAR ATR also seek solutions from deep learning. We review the related algorithms with regard to SAR ATR in this paper. We firstly introduce the commonly used datasets and the evaluation metrics. Then, we introduce the algorithms before deep learning. They are template-matching-, machine-learning-and model-based methods. After that, we introduce mainly the SAR ATR methods in the deep-learning era (after 2017); those methods are the core of the paper. The non-CNNs and CNNs, that is, those used in SAR ATR, are summarized at the beginning. We found that researchers tend to design specialized CNN for SAR ATR. Then, the methods to solve the problem raised by limited samples are reviewed. They are data augmentation, Generative Adversarial Networks (GAN), electromagnetic simulation, transfer learning, few-shot learning, semi-supervised learning, metric leaning and domain knowledge. After that, the imbalance problem, real-time recognition, polarimetric SAR, complex data and adversarial attack are also reviewed. The principles and problems of them are also introduced. Finally, the future directions are conducted. In this part, we point out that the dataset, CNN architecture designing, knowledge-driven, real-time recognition, explainable and adversarial attack should be considered in the future. This paper gives readers a quick overview of the current state of the field.


Introduction
Compared with optical sensors, Synthetic Aperture Radar (SAR) can obtain high-resolution images all day and under any weather.Thus, SAR is widely used in military and civilian sectors.The purpose of SAR ATR is to automatically recognize important targets (vehicles, ships and aircraft), which is the key technology of reconnaissance [1,2].Lincoln laboratory proposed a three-level flow chart of SAR ATR [3], which includes detection, discrimination and classification, as shown in Figure 1.Detection algorithms can find Regions of Interest (RoI) containing potential targets [4].CFAR (Constant False Alarm Rate) is a common method for such detection.It first determines a threshold according to the input image and compares it with every pixel.If the input pixel exceeds this threshold, it will be regarded as a target; otherwise, it will be regarded as background.The core of the algorithm is to describe images in terms of statistical characteristics.Lognormal, Weibull and K distribution are usually used.When the background is clear, it can obtain good performance.However, when the background is complex or the target is weak, it can produce false alarms [5,6].
The purpose of discrimination is to eliminate false alarms generated by natural and artificial clutter.The features that are usually used are geometric features, centroid, aspect ratio, backscatter features, texture features, polarization features, etc. [7,8].The purpose of classification is to determine the categories of the targets.The template-matching-, machine-learning-and model-based methods are three usually used methods.Among them, the machine-learning method has better results and is widely used.It includes mainly two key steps: feature designing and classifier designing.
The SAR ATR algorithm has achieved great improvements on past algorithms.The core consists of designing distinguishing features and powerful classifiers.Shallow features such as aspect ratio, Surf (Speeded Up Robust Features) and LBP (Local Binary Pattern) are usually used.Neural networks, decision tree, SVM (Support Vector Machine) and random forest are the usually used classifiers [9].
SAR is the combination of the scattering units with the electromagnetic-scattering features.There are speckle, geometric distortion, shadow and other phenomena.SAR is vulnerable to changes in working conditions, for example, polarization mode, imaging angle and target scattering.Furthermore, the samples are limited.Datasets also have large intra-class differences and small inter-class differences, which bring difficulties to classification.SAR images also have difficulties in robust feature extraction and unbalanced class distribution, which render ATR more difficult.
Since the emergence of AlexNet in 2012, deep learning shows advantages over traditional methods.Traditional methods of feature extraction rely mainly on human experience.These methods have poor generalization performance [10].Deep learning (especially the Convolutional Neural Network, CNN) automatically learns features from data, feature extraction and classifier are done at the same time.Thus, it has strong high-level feature-learning ability and high classification accuracy.Due to these advantages, SAR ATR also gradually adopts this method.CNN can automatically learn effective features, thus avoiding the difficulties of designing features manually.
We generate statistics from relevant papers and obtain the number of papers on traditional and deep-learning-based algorithms in this area in recent years in Table 1.We can see that SAR ATR entered the era of deep learning in 2017; since then, most of the papers adopted a deep-learning method.SAR ATR has the following difficulties in practical applications due to the large differences with the optical image.
1.The number of SAR images is insufficient.This is the main reason that restricts the application of deep learning in SAR ATR.It will lead to serious over-fitting, resulting in low generalization.Thus, most of the papers based on SAR ATR try to improve the recognition result on limited samples.
2. Some classes have more samples, and some classes have fewer samples.The existing dataset generally has the problem of being imbalanced among categories, which also restricts the good results.
3. SAR images obtained under different conditions have different characteristics, which renders it difficult for existing data-driven deep-learning methods to extract robust features.
4. The SAR scattering center changes with the target azimuth angle, resulting in the different results from the recognition system at different azimuth angles, even for small azimuth increments.
Since 2017, a substantial number of achievements have been made in solving the above problems.However, no paper has systematically studied them, which is one of the motivations of this paper.Therefore, we selected the most representative 197 papers for review.The framework of the papers is shown in Figure 2.

Related Work
As far as we know, there are seven papers [9,[11][12][13][14][15][16] which are related to our work to some extent.We divided them into three directions, as shown in Figure 3.They are as follows: (1) reviews on traditional methods; (2) reviews mainly on the traditional methods, while the deep learning methods are not reviewed thoroughly; (3) reviews mainly on the optical images, while SAR images are not reviewed thoroughly.

Mainly on optical images,little SAR images
Li et al. [9] surveyed the feature extraction of SAR images.Wu et al. [11] reviewed the techniques of ship classification with SAR images in the past twenty years.They give some comments and suggestions for this area.Wang et al. [12] summarized the feature extractors from three directions.All of these three papers are a survey of the traditional algorithms.However, the deep-learning-based algorithms are not analyzed.
Odysseas et al. [13] surveyed the SAR ATR algorithms, specifically those trained and tested on MSTAR.The reflectivity attributed, attributed scattering centers, sparse representation, hybrid reflectivity attributed and compressive sensing-based methods are introduced in that order.The strengths and weaknesses of each technique are analyzed.The problem and the direction of the dataset are also highlighted.Darymli et al. [14] analyzed the challenges of SAR ATR.They divided SAR ATR into three steps.These steps are detection and low-and high-level classification.The authors divided them into model-, semi-model-and feature-based methods.These two papers reviewed many SAR ATR methods but not mainly on deep learning.
Song et al. [15] surveyed the advanced CNNs in classification.The specialized CNNs, public datasets and data augmentation methods are also introduced.The problems and challenges are also pointed out.John et al. [16] reviewed deep learning in view of theories, tools and challenges.The inadequate datasets, transfer learning, theoretical understanding and optimizing methods are analyzed in that order.These two papers give a systematic overview of deep-learning-based recognition, but they focus mainly on optical images, while SAR images do not constitute the core.
In summary, our work is different from the aforementioned, related work.It is the first paper that systematically reviews the deep-learning-based SAR ATR.

Datasets
Datasets with labels are the basis of SAR ATR.Currently available datasets include MSTAR (Moving and Stationary Target Acquisition and Recognition) [17], Open-SARShip [18], OpenSARShip 2.0 [19], OpenSARUrban [20] and FuSARShip [21], as shown in Figure 4. OpenSARShip was constructed by Shanghai Jiaotong University.The information of OpenSARShip is shown in Table 2.It contains common types of civilian ships.The ships in the OpenSARShip are derived from 41 SAR images acquired by Sentinel-1.During the production of the dataset, the Automatic Identification System (AIS) information is used.The scene includes five ports in Shanghai, Shenzhen, Tianjin, Yokohama and Singapore.The dataset uses GRD (Ground Range Detected) products and SLC (Single Look Complex) products.It includes 11 ship classes.There are 11,346 ship slices in the dataset.Among them, Cargo constitutes the majority, accounting for 72.47%, and some categories have too few samples.OpenSARShip 2.0 is similar to OpenSARShip.It has 34,528 SAR chips with AIS information.Some of the ship chips of OpenSARShip 2.0 contain undesired effects.Some have interference information.FUSAR-Ship constructs the dataset via SAR-AIS matchup.The data sources are the 126 GF-3 SAR images.It has 5000 ship chips with AIS information.It has 15 ship categories and 98 sub-categories.It has the following characteristics: high-resolution, consistency, diversity, extensibility and large-scale.It can also be used for detection, wake tracking and semantic segmentation.

Datasets
In addition to the above military vehicle and ship, aircraft recognition has also been studied, but there is no public dataset now.The airplane has many scattering points.Due to the complex structure of the airplane, different parts have different scattering mechanisms, which are variable.Therefore, the feature diversity of aircraft renders aircraft recognition difficult.
In addition to the above real SAR data, many papers also use simulation to generate datasets to train recognition algorithms [22].
We can find that the above SAR datasets are very small when compared with optical datasets.Thus, many researchers try to solve the problem raised by limited samples, and the methods are shown in Section 5.3.

Evaluation Metrics
There are many indicators used to evaluate the recognition algorithm, and the calculations of these indicators are based on the confusion matrix.Thus, we introduce the confusion matrix firstly, as shown in Figure 5.

False
TP rate The accuracy is generally used to evaluate the global accuracy of a model.It is calculated as follows: + accuracy Precision represents the ratio of ships that were correctly found in a positive detected result.It is calculated as follows: precision Recall represents the ratio of ships that were correctly found in the ground truth.It is calculated as follows: recall Precision and recall are contradictory.In order to give consideration to precision and recall at the same time, the F1-score is proposed.F1 is computed as below.

F1-score precision recall precision recall
The P-R (Precision-Recall) and ROC (Receiver Operating Characteristic) curves are also indicators commonly used for comprehensive evaluation of recognition algorithms.The horizontal axis of the P-R curve is the recall rate, and the vertical axis is the accuracy rate.The larger the area under the P-R curve, the better the classifier.The larger the area under the line of the ROC curve, the better the classifier.

The Traditional Methods
Generally, traditional SAR ATR has template-matching-, model-and pattern-recognition-based methods, as show in Figure 6.

Template-Matching-Based Methods
The template-matching-based methods build a template library through a large number of samples.The similarity is compared under the criteria (Mean Square Error, MSE) [23,24].The category with the highest matching similarity is used as the prediction.It can be divided into the direct-matching and correlation-filtering method.Although template matching is simple in engineering, it has the following problems.It is not robust enough and can adapt to recognition only under restricted conditions.For recognition under unrestricted conditions, the performance degrades seriously.Furthermore, it requires many templates, which are difficult to implement.As the number of categories and samples increases, the template library gradually increases, and the real-time performance becomes worse.Therefore, in the era of artificial intelligence, the application of template matching is gradually shrinking.

Machine-Learning-Based Methods
As pattern-recognition theory progresses, machine learning is also adopted in SAR ATR.It has two steps, as shown in Figure 7 [25,26].Firstly, the features that are helpful for recognition are extracted.Then, the combination of them is selected as the feature vector.Then, according to a certain similarity measure, a classifier that can distinguish targets is designed.It can be divided into two stages: training and testing.In the training phase, SAR image features will be extracted, and then the classifier will be optimized using the extracted features and the labels.Through the optimization algorithm and samples in the dataset, the model can converge.When the new samples are input into it, it can output the result.The machine-learning method has low storage and high processing speed.Whether robust features can be extracted is crucial to the final recognition accuracy.Unlike targets in optical images, which have complete contours, targets in SAR images have sparse scattering centers and are very sensitive to azimuth changes [27,28].Thus, extracting useful features is difficult for SAR ATR.The geometric structure features such as perimeter, area and aspect ratio and the electromagnetic-scattering features such as peak value and scattering center are usually used.Electromagnetic-scattering features-for example, peak value and scattering center-are usually used.Transform features such as Fourier transform and wavelet transform, local invariant features such as SIFT (Scale-invariant Feature Transform) and generalized invariant moment are also usually used.Features with strong discriminative ability play an important role in recognition.
Designing an appropriate classifier is another important step.Typical methods include a support vector machine, neural network, adaptive boosting, sparse representation, K-Nearest Neighbor (KNN) and Bayes.
The deep-learning network has emerged in recent years.The most typical network is CNN.It adopts the strategy of automatic feature extraction such that it can extract robust features from many samples, which is more advantageous than the traditional method.A substantial amount of research shows that deep learning is an effective method for SAR ATR.

Model-Based Method
The model-based method mainly generates images under different conditions through a 3D electromagnetic-scattering model or a Computer-Aided Design (CAD) model [29].Because the model can be processed and operated in the calculation process, the electromagnetic-scattering features under different conditions can be flexibly simu-lated.Its core is the PEMS (Prediction, Extraction, Matching and Search) subsystem.However, this method has some shortcomings, which are mainly shown as follows.First, the physical simulation is difficult to run real-time.Second, the data generated via simulation calculation are not electromagnetic-scattering characteristics with a clear physical meaning.Third, when the structure of the target part or its scenario changes, the overall calculation needs to be re-conducted.These shortcomings restrict the application of the physical model in practice.

The Deep-Learning-Based Methods
Since the success of AlexNet in ILSVRC (ImageNet Large Scale Visual Recognition Challenge), the key to image classification has turned from feature designing to CNN designing.Many CNNs, such as VGGNet [30], Inception [31], ResNet [32], ResNeXt [33] and DenseNet, have been proposed [34].Similarly, due to the huge advantages of CNN, it is also used in SAR ATR and shows good performance.Furthermore, many papers have emerged.This paper summarizes mainly the deep-learning-based SAR ATR algorithm, including mainly its eight aspects, as shown in Figure 8.

The Non-CNN Models
Before the success of CNN, many non-CNN deep-learning models were used in feature representation as shown in Figure 9.For example, restricted Boltzmann machine (RBM) [35], Deep Belief Network (DBN) [36], auto-encoder and so on.RBM consists of two shallow visual and hidden layers, which are fully connected with each other.It learns the probability model from input data.DBN is composed of multiple RBM stacks.It uses a layer-by-layer unsupervised method to learn parameters.It can solve the problem of many hidden layers and difficult-to-optimize models.It can train deep networks and lay a foundation for the results of deep learning.Auto-encoder renders the input and output more similar.It is an unsupervised learning process which is mainly used for data dimension reduction or feature extraction.The non-CNN models [37][38][39][40][41].
Reference [37] proposed a discriminant deep belief network, which is used to learn high-level features of targets.A weak classifier is trained with pseudo-labels.Then, a specific SAR image block is represented by a set of projection vectors.Finally, projection vectors are input to produce discriminative features for classification.Reference [38] used the unsupervised learning method to build a pre-training model for feature extraction.This method can effectively use more samples.Reference [39] proposed compact convolutional auto-encoder for ATR.It produced a more discriminative feature representation by imposing compactness constraints on the encoder while minimizing the reconstruction loss.In reference [40], the deep network is divided into a convolutional auto-encoder network and a shallow neural network.The convolutional auto-encoder is trained via unsupervised learning as a feature extractor, and the shallow neural network containing a full connection layer is trained via supervised learning to predict the target category.Carlos et al. [41] also used a de-noising auto-encoder to build a pre-training network for feature extraction to classify ships in SAR images.
Compared with these non-CNN models, CNN is strongly supervised and, thus, has the advantage of high accuracy.Thus, in the deep-learning era, these non-CNN models are not mainstream; related research is also relatively minimal.

The CNN Models
The CNNs used in SAR ATR are shown as below.They are the off-the-shell CNN, the specialized CNN, the attention-based CNN and the capsule network as shown in Figure 10.In the early years, researchers preferred to adopt the off-the-shell CNN models in SAR ATR.This is because the success of CNN has not been proven in SAR ATR.
Shao et al. [42] compared the existing CNNs on SAR ATR in detail for the first time.The classical CNNs-for example AlexNet, VGGNet, GoogLeNet, ResNet, DenseNet and SENet-are applied on MSTAR.The results showed that most of the CNNs can obtain an accuracy of 99% on MSTAR, which shows superiority performance compared with the traditional algorithms.The running speeds are also analyzed in the paper.Fu et al. [43] used ResNet to obtain a good recognition performance on the small dataset.The dropout layer into the building block is also used.The center and softmax loss are adopted.It achieved an accuracy of 99.67% on MSTAR.Soldin et al. [44] used ResNet-18 on MSTAR to verify the effectiveness of the deep-learning-based SAR ATR.It had 99% accuracy with 10 types of targets.Anas et al. [45] adopted VGG-16 to extract features.Parameters are trained on the ImageNet firstly.Furthermore, the last three convolutional layers were re-trained on MSTAR.It achieved an accuracy of 97.91% on 10 different classes.
The above studies try to demonstrate the effectiveness of deep learning in SAR ATR.However, due to the differences between SAR and optical images, it is not appropriate to use CNNs in computer vision for SAR ATR.Thus, researchers are more inclined to design specialized CNNs, as shown in the next section.

Specialized CNN for SAR ATR
Researchers tend to design specialized CNN for SAR ATR.The specialized CNN can be divided into shallow and deep forms.a.The shallow CNN Morgan et al. [46] designed a new CNN for SAR ATR.It has three convolutional layers, two max-pooling layers and one fully connected layer.It achieved 92.3% accuracy on a 10-way MSTAR dataset.Chen et al. [47] proposed a fully convolutional network.It has five convolutional layers, three max-pooling layers and one softmax layer.Results showed that it can achieve 99% accuracy.Xu et al. [48] proposed SARNet for SAR ATR.It has two convolutional-pooling and two full-connected layers.It achieved 95.68% on an MSTAR dataset.Li et al. [49] proposed DeepSAR-Net for learning discriminative features without human intervention.It consists of four repeated convolutional, normalization and max-pooling layers and two repeated convolutional, normalization and ReLu layers.It achieved 98.36% accuracy on three-class MSTAR.Liu et al. [50] presented a new convolutional network for SAR ATR.It has six convolutional layers and one fully connected layer.Data augmentation is also used for overcoming the limited sample problem.It achieved 99.48% accuracy on five-class MSTAR.Qiao et al. [51]  The above-mentioned CNNs are for the SAR ATR that appeared in the early stage.These CNNs are stacked with several convolutional and pooling layers and connected to a classifier at the end.They have fewer layers.According to the common knowledge of deep learning, the deeper the network is, the stronger feature expression ability it has.Thus, their recognition abilities are worse than those of deep CNN in the computer vision.Therefore, it is absolutely necessary to use deep CNNs for SAR ATR.A large number of related achievements have also appeared.We will review them in detail in the next section.
b.The deep CNN Zhai et al. [56] proposed MF-SarNet for SAR ATR.The fire module is used for extracting features with fewer parameters.MF-SarNet consists of eighteen convolutional layers, two fully connected layers and eight fire-modules.The data augmentation of clock-wise-based rotation is used to expand the dataset 360 times.It achieved 98.53% accuracy on MSATR.Xie et al. [57] presented a neural network named umbrella.Umbrella has two blocks; one is a summation of three 3-layer paths, and the other is the concatenation of three 3-layer paths.The fusion of the six paths can extract rich features from different spatial scales.The designed CNN has five convolutional layers and two umbrella layers.It achieved 99% accuracy for a 10-class MSTAR dataset.Huang et al. [58] presented a new CNN called group squeeze-excitation sparsely connected convolutional networks.It conducted reweighting with fewer parameters.It is more efficient than DenseNet.It achieved 99.79% accuracy on an MSTAR dataset.Dong et al. [59] proposed a global receptive for building a special hierarchy of feed-forward neural networks.It has two feature-generation and refinement modules.The multiple receptive signals are used to extract features.The expert knowledge is also transplanted into the neural network.It achieved 95.07%accuracy on an MSTAR dataset.Wang et al. [60] presented SSF-Net with a sparse data feature extraction module.The other layers are also used to improve the efficiency.It has 99.55% accuracy.Wang et al. [61] proposed DNet, which can learn scale information.The special layers are added to render it more standardized and practical.It achieved 99% accuracy on a 10-class MSTAR dataset.Feng et al. [62] proposed a convolutional neural network to fully learn the feature information of SAR images.It performs noise suppression on SAR images firstly.It consists of 7 convolutional layers and 7 pooling layers.It shows better results on 10-classes of MSTAR datasets.Pei et al. [63] presented a feature extraction and fusion network for recognizing targets in multi-view SAR images.It is based on a multiple-input network with deformable convolution and squeeze-and-excitation.It achieved 99.31% accuracy on a 10-class MSTAR.Wang et al. [64] proposed a multi-view CNN with deformable convolution under a limited dataset.The deformable convolution can learn characteristics of the targets.It can capture more information from different views.Shang et al. [65] proposed M-Net for solving the problem of over-fitting due to the limited dataset.M-Net uses information recorders to store the spatial characteristics and uses spatial similarity to predict the labels of unknown samples.In order to optimize M-Net better, parameter migration training is used.The first step is to train CNN in M-Net to initialize parameters.The second step is to use initialization parameters in M-Net and use an MSTAR dataset for training.It achieved 99.71% accuracy on 10-class MSTAR.The experiments on MSTAR showed the effectiveness of M-Net.Lin et al. [66] adopted a highway network to allow information to pass through each layer of the deep neural network at high speed without hindrances, effectively reducing the impact of gradient disappearance problem.The convolutional highway network is based on the gate mechanism, including two basic structures: conversion gate and handling gate.One part of the input is converted through conversion gate, and the other part is directly passed through the handling gate.It achieved 99.09% accuracy on 10-class MSTAR.
Due to the development of CNN, SAR ATR also gradually adopted the ideas from this development.These specialized CNNs take into account the specific features of SAR images, such as speckle noise, sensitivity to angle and limited samples.

Attention-Based CNN
The attention mechanism-for example SENet and CBAM (Convolutional Block Attention Module)-can assign weights according to the importance of regions or channels.It can capture more valuable information and add less computation.Thus, it is widely used in computer vision and SAR ATR.Wang et al. [67] highlighted that the CNN will disturb the classifier.Thus, they designed a novel network ESENet with an enhanced squeeze-and-excitation module.ESENet has four convolutional layers, three max-pooling layers and one full-connected layer.The enhanced squeeze-and-excitation module uses a convolutional layer which can extract more effective features.It achieved 97.32% accuracy on MSTAR.Shi et al. [68] presented a deep residual shrinkage network with an attention module.The experiments on MSTAR showed that it can reduce the number of parameters while ensuring accuracy.Zhang et al. [69] used an attention module for SAR ATR on limited samples.The CBAM is lightweight and effective.It sequentially applies channel and spatial attention to learn "what" and "where".The results on MSTAR showed that it achieved 99.35% on a ten-class dataset.Li et al. [70] proposed channel and spatial attention modules to refine and suppress features.The two lightweight layers are used to encode the weight map.Experiments on MSTAR showed that it has good performance (112.54K parameters with 99.51% accuracy on 10-class MSTAR).Su et al. [71] proposed a complete frequency channel attention network for recognizing noisy images.It uses 2D discrete cosine transformation to select the important channels.The method is robust to the noise.The experiments showed that it is better than CBAM (97.65% versus 94.38% on a WHU-SAR6 DATASET dataset).Wang et al. [72] presented a multi-view attention network to learn features from different aspects.The spatial attention is used to find the important region.The LSTM (Long Short-Term Memory) is used to fuse the features from adjacent azimuths.It achieved 99.38% accuracy on 10-class MSTAR.
Through the above papers, we can find that the attention mechanisms commonly used in SAR ATR are SENet and CBAM.They are borrowed mainly from computer vision.In the future, we should design an attention mechanism based on SAR images.

Capsule Network
A capsule network can be used to improve the interaction between features.Every capsule is a vector, and only those features with targets can make a contribution to the prediction [73].[77] proposed a new capsule network for improving performance under EOC.Multiple dilated convolutions are adopted for extracting features that are multi-sized.Feature refinement is used for extracting discriminative features.A feature pose preserving layer is adopted for high accuracy.It achieved 99.18% accuracy on 10-class MSTAR.
A capsule network performs better than CNN in some cases.However, it has a large amount of computation, a narrow range of adaptation and little support for other tasks, so it is less used today.

Others
In addition to the above content, there are some other achievements in applying CNN to SAR ATR.Some examples include regularization, feature fusion, and so on.
Feng et al. [78] studied the influences of data augmentation, L2 regularization term, and dropout on MSTAR.They also selected AlexNet and ResNet to train the ATR model.Results showed that AlexNet series with dropout are optimized better.L2 regularization terms can improve the accuracy.Data augmentation is effective on the small dataset, as the deep-learning models are always data-hungry, and SAR images are scarce compared with optical images.Kuang et al. [79] investigated the effect of the amount of training data.The experiments conducted on MSTAR found a good result for the smallest amount of training data.Wang et al. [80] proposed multi-level feature fusion for SAR ATR.The features are from ResNet.Different lever features are fused for getting a good classification performance.Li et al. [81] proposed a multi-aspect SAR recognition method based on self-attention.It can find the relationship of the targets in images.The convolutional auto-encoder is used to pre-train the network, which can improve the anti-noise ability and reduce the dependence on a large dataset.Zhao et al. [82] proposed the EfficientNet and GRU (Gated Recurrent Unit), which are robust to the angle of incidence.

Methods to Solve the Problem Raised by Limited Samples
Due to the powerful feature-extracting ability of CNN, CNN shows great advantages on SAR ATR.However, the training of CNN depends heavily on labeled data.The performance will decrease dramatically when the labeled samples are insufficient.
What is more, SAR images are not easily available.Therefore, it is absolutely necessary to improve the performance with limited samples.Common methods include data augmentation, transfer learning, generating new samples, few-shot learning, metric learning, semi-supervised learning and adding domain knowledge.They are shown in Figure 11.

Data Augmentation
Data augmentation is commonly used in deep learning to improve the performance of neural networks [83].Due to the scattering characteristics of SAR, the targets in SAR images may be quite different with different azimuth angles.The rotation method commonly used in optical imaging is not suitable here.How to effectively expand the SAR data needs to be considered.Recently, researchers have carried out research on this issue and have made some progress.They fall into the following categories as shown in Figure 12.Ding et al. [84] studied the results of translation, noise addition and sample synthesis.In the sample synthesis method, in order to generate a special azimuth angle sample, the combination of two closest images is taken as the composite sample.It achieved 93.16% accuracy on 10-class MSTAR.Ding et al. [85] used translation and random speckle noising to strengthen the invariance of CNN models.Hidetoshi et al. [86] discussed the translation invariance of CNN on SAR ATR.The data augmentation is conducted with the random cropped patches of 96 × 96 from the chips of 100 × 100 pixels in the training phase.They conducted the experiments on MSTAR before and after the data augmentation.The results showed that after data augmentation, it achieved 99.6% accuracy.Jiang et al. [87] presented Gabor-deep CNN for a limited SAR training dataset.The Gabor features were used at first.Experimentation on MSTAR proved the effectiveness of the method.It achieved 96.32% accuracy on 10-class MSTAR.Lei et al. [88] proposed clutter reconstruction for augmentation.The augmentation is conducted from the aspect of signal and noise.The variable convolution kernels are used to model the spatial correlation.Furthermore, the background reflectance was reconstructed via power-law transform.Experiments showed that this method is effective and universal.Zhang et al. [89] used existing training samples to build unknown training samples, so as to improve the robustness of CNN and improve its classification accuracy.Lv et al. [90] presented a data augmentation method based on ASC (Attribute Scattering Center).It uses sparse representation to extract ASC from a single image and selects some ASCs to rebuild the image.The rebuilt images have a function of de-noising.By conducting this step several times, new images can be produced as usable training data.CNN is designed for classification and trained through enhanced images.On MSTAR, the proposed method can classify 10 classes under SOC with an accuracy of 99.48%.

Data augmentation
Data augmentation is widely used in deep learning.There are many effective methods, for example, flipping, cropping, rotation, shifting, random erasing, mosaic, mixup cutout and cutmix.In SAR ATR, data augmentation is relatively simple to use but less studied.Due to the fact that the SAR data acquisition is limited, it is necessary to focus on data augmentation.Other than data augmentation, GAN and electromagnetic simulation can also be used for generating new samples.

GAN for Generating New Samples
GAN has two adversarial networks.They are generator and discriminator.The task of the former is to produce an image close to the real image.The task of the discriminator is to determine whether the produced image is real.After a substantial amount of training, the generator can produce a near-real image [91].Using samples produced by GANs for the classifier can obtain a good prospect.They fall into the following categories as shown in Figure 13.Guo et al. [92] adopted GANs to produced SAR images and solved the difficult problem of model training caused by noise through clutter normalization.They compared SAR samples generated by various GANs models.They include DCGAN (Deep Convolutional Generative Adversarial Networks) and WGAN (Wasserstein-Generative Adversarial Networks) [93][94][95][96].Cui et al. [97] used WGAN to produce extended data and proposed a data-selection method to select high-quality images with a special azimuth angle.The performance of this method was demonstrated on the classification MSTAR dataset.It achieved 91.6% accuracy on 10-class MSTAR.Zhu et al. [98] adopted Cy-cleGAN to convert the simulated data to the real data.CycleGAN is an unpaired domain-adaptive learning method, which can realize image conversion between different domains.In the method of using CycleGAN for simulating sample optimization, the training phase is used to build a generation network that converts simulation samples to real samples.In the test phase, the trained generation network is used to convert simulation samples into closer to real samples.The simulation samples converted by CycleGAN are closer to real samples.Simulation samples are effective in improving the performance of the classifier [99].CycleGAN is used to render simulated data more similar to real samples.Results showed that it leads to an approximately 10% increase in accuracy.Wagner et al. [100] generated samples through elastic deformation and affine transformation.GAN is capable of generating new data by learning the distribution of data through adversarial training of generators and discriminators.It achieved 99.5% accu-racy on 10-class MSTAR.Hwang et al. [101] presented triple-GANs to improve SAR ATR performance.Another classifier was added to make the generator converge with the real data distribution.Luo et al. [102] proposed a method to generate samples of small classes.They were expanded via automatic-search-based data augmentation.This method can produce good samples for small classes so as to solve the problem of unbalance.It can improve the accuracy of minority class by 11.68%.Reference [103] proposed a translation network between optical and SAR images via an improved conditional GAN.It achieved 77.97% accuracy on SPH4 Dataset.
Through training, GAN can generate rich samples, which is a better way to expand data.However, due to the problems of training difficulty, lack of stability and collapse of GANs, it is a challenging task to improve the performance of the classifier via adversarial training.What is more, due to the diversity of SAR-imaging performance and the complexity of the mechanism, the image produced by GAN is still different from the actual image.This will lead to the poor migration ability of the trained model, and it is difficult for the model to adapt to the new samples with large changes.

Electromagnetic Simulation for Generating New Samples
Using electromagnetic simulation to generate new samples is another idea in SAR ATR as shown in Figure 14.RaySAR is the typical method; it needs to manually set electromagnetic parameters related to the target, and the quality of the simulation image depends on the setting.In order to determine the effective electromagnetic parameters, Niu et al. [104] used a neural network for regression prediction of electromagnetic simulation parameters.In the training stage, a series of electromagnetic simulation parameters were set according to experience, and simulation images were generated by RaySAR.The simulation images were used as input, and the electromagnetic simulation parameters were used as output to train the model.In the test phase, the real SAR target is input into the trained model, and the output is the best electromagnetic simulation parameters predicted by the network, which can be used to generate SAR simulation samples.Hansen et al. [105] studied the transfer of learning between simulated and real SAR images.The simulated dataset is obtained by the electromagnetic reflection characteristics.By this, samples in the simulated dataset do not require geometric duplication.Experiments showed that the pre-training on this simulated dataset can make the model converge faster.Cha et al. [106] designed an SAR-simulation data-adjustment method based on a deep residual network.They went from simulated data to real data as a function of the residual network and used this function to adjust the simulated image.Ahmadibeni et al. [107] proposed an SAR image electromagnetic simulation system for ATR.First, 250 CAD models were prepared with different objects.The simulation process consists of four steps.Firstly, the electromagnetic backscatter reflectance of the target is captured.Secondly, simulated samples are generated by using the noise modulation transfer function.In the third step, a method is used to project the target shadow from eight different perspective views.Finally, the surface regions producing high-intensity radiation backscatter are highlighted to further enhance the realism of the SAR images generated by the simulation.Zhang et al. [108] studied the accurate recognition using only simulated samples.Due to the distribution difference between simulated and real data, the recognition effect was poor.Therefore, they adopted a hierarchical identification method.Firstly, the pre-trained CNN is adopted to classify the image.Then, the samples that are easy to misclassify are found and reclassified.For these samples, they proposed a multiple-similarity fusion classifier, which measured the relation of them and then reclassified them.
Electromagnetic simulation can ease the limited sample problem of SAR ATR and improve the accuracy of recognition.However, the SAR image of electromagnetic simulation is also faced with the problem that there is a gap between the authenticity of the actual sample and SAR image.We need to continuously improve the authenticity of electromagnetic simulation.

Transfer Learning
Transfer learning can use the common knowledge between source task and target task.It is widely used in the condition that the training samples are limited.They fall into the classes as shown in Figure 15.Reference [109] used a CIFAR-10 dataset to pre-train the network; then, the intermediate layers were used for TerraSAR classification.It achieved 64.64% accuracy on the TerraSAR dataset.Marmanis et al. [110] pointed out that due to the great imparity of optical and SAR data, it is difficult to apply the network trained on optical data.Even the low-level network features are difficult to effectively transfer.Lu et al. [111] used off-the-shell pre-trained models such as ResNet-50 and VGG-16.The low-level neural layers shared common features on different tasks.Thus, they changed only the fully connected layers and classifiers.It achieved 98.57% accuracy on the TerraSAR-X dataset.Zhai et al. [112] presented efficient transferred CNN for SAR ATR.They initialized MS-CNN firstly.Then, MS-CNN was trained on a source dataset, and the shallow layer's (before conv4) parameters were fixed.Finally, the MS-CNN was trained on the target dataset.It achieved 98.83% accuracy on 10-class MSTAR.Ying et al. [113] construed a lightweight Atrous-Inception module for SAR ATR.In order to train it, several types of images were transferred to the SAR task.Furthermore, the performance of classification was improved on limited datasets.It achieved 97.97% accuracy on 10-class MSTAR.Song et al. [114] proposed a data-and feature-level transfer learning method.CycleGAN was used to convert optical images into intermediate domain SAR images.After that, the domain transfer method was adopted to realize recognition through domain adaptation of intermediate domain and target SAR.Experiments demonstrated that the two-level structure had good performance on military and civilian classification tasks.Zhang et al. [115] trained CNNs on MSTAR and fine-tuned them on OpenSARShip.It achieved 79.12% accuracy on an OpenSARShip dataset.Huang et al. [116] constructed a large land cover SAR dataset with 150 classes and up to 0.1 million chips.The CNN models trained on the above dataset were regarded as the pre-trained models.Furthermore, they were retrained on MATAR.The results showed that it could obtain an accuracy of 99.46%.This shows the benefit of pre-training on similar datasets.Zhang et al. [117] used many SAR images to train GAN to learn common features.After that, a pre-trained layer was repeatedly used to transfer general features to SAR ATR tasks.The accuracy improved from 92.76% to 96.24% on an MSTAR dataset.Reference [118] proposed a task-driven domain adaptation transfer learning method based on simulated SAR data.Reference [119] produced a large amount of SAR images via simulation.The pre-trained weight was used as the initial parameter and transferred to the actual SAR image.Experiments on MSTAR showed that it could obtain an accuracy of 99.78%.Wang et al. [120] used transfer learning to narrow the simulation and SAR images.They firstly pre-trained the model on a substantial amount of simulation data and minimal SAR data.Then, it was fine-tuned on real SAR data.Experiments on MSTAR showed the superiority of the method.It achieved 94.4% accuracy on 10-class MSTAR.Huang et al. [121] built a pre-training model through unsupervised learning by using a stack-convolution auto-coding network.Furthermore, they introduced a reconstruction bypass to provide regularization constraints.It achieved 96.62% accuracy on 10-class MSTAR.Borgwardt et al. [122] improved the network via a domain-adaptive learning method based on minimizing the difference.Adaptive learning reduces the difference between the source and target data in feature domains.Furthermore, the addition of domain-adaptive learning can further improve the performance after migration.Huang et al. [123] discussed the transfer problem in SAR ATR from three aspects: which network, which layer and how to carry out effective transfer learning.The following conclusions were drawn: large networks have better transfer potential; the closer the source to the target, the better the effect is.

Transfer learning
Transfer learning has achieved good results.However, the theoretical basis is that the target and source domain data have similar characteristics.Nonetheless, SAR image and optical image have great differences in imaging mode noise, so transfer learning needs to be reconsidered in SAR ATR.

Few-Shot Learning
Few-shot learning can fit an unseen category on limited samples after training on a large amount of data of a certain category.The generalization of prior knowledge can be transferred to the new task.It is a special case of meta-learning in unsupervised learning.It is also used in SAR ATR as shown in Figure 16.Wang   Wang et al. [124] proposed a few-shot method based on a conv-biLSTM prototypical network.Experiments on three types and five training samples of an MSTAR dataset showed that it can achieve an accuracy of 90%.Wang et al. [125] combined meta-learning with amortized variable input.The global parameters of meta-learning were used as the extractor.The specific parameters of probability distribution could adapt to the task with a small number of samples.Experiments showed that it could obtain 97.3% accuracy on MSATAR.Wang et al. [126] proposed a hybrid input network.It consisted of two stages.In the first stage, SAR images were mapped into embedded space.In the second stage, the samples in the embedded space were classified by combining inductive and transductive reasoning.Finally, the classification results were obtained by combining the above two reasoning methods.They proposed enhanced mixed loss to obtain better separability between classes.The results on MSTAR showed that it performs well in few-shot SAR classification.In order to transfer prior knowledge from simulation images to SAR images, Wang et al. [127] proposed probabilistic reasoning and the me-ta-learning-based method.First, they used the features extracted from simulation data to learn the global parameters of the model.Secondly, new features were extracted from the real data.Finally, a prediction distribution was generated to represent the confidence level of the target class.The experimental results showed that the model is superior with limited training samples.It achieved 97.6% accuracy on 10-class MSTAR.In order to learn more discriminant features from labeled data, Wang et al. [128] proposed an attribute-guided multi-scale model.Complex-valued images were used for sub-band decomposition.The proposed model was used to combine multi-scale features and improve the distinguishing ability.A priori binary attribute of SAR target was used, and additional classification was added.Li et al. [129] combined graphical neural networks with meta-learning and proposed a new graphical meta-learning method.Simulated SAR data were used to obtain meta-knowledge firstly.Furthermore, the labeled and unlabeled data were embedded to a vector, which is represented by a fully connected graph.The graph was iteratively updated via neighborhood aggregation to obtain a new representation of nodes and their relationships.Finally, the prediction distribution of the target class was generated by combing the values of node and edge.Experiments showed its superiority accuracy with minimal training data.Fu et al. [130] presented a meta-learning framework for SAR ATR.It can appropriate update strategies, and it can achieve fast adaptation by training images of some new tasks.Three transfer learning methods were adopted to overcome the meta-learning problems.The results showed that meta-learning is a good method for SAR ATR with limited samples.It achieved 1.7% and 2.3% improvements for one-shot and five-shot recognition on an NIST-SAR dataset.
Few-shot learning is a solution in the case of insufficient SAR samples.Although its performance is poor compared with that of strong supervision, it still has certain research value.However, as SAR sensors become more common, and it becomes easier to collect large amounts of SAR data, the benefits of such methods will be further reduced.

Semi-Supervised Learning
Collecting and labeling SAR images require a substantial amount of work and are difficult to realize.However, semi-supervised learning could utilize both labeled and unlabeled data and could improve the learning performance.Thus, it attracts the attention of many researchers in SAR ATR.They fall into the classes as shown in Figure 17.GAN can effectively estimate the distribution of data from training samples, so it could be used for the research in [131].Similar to GAN, semi-supervised GANs also have a generator and discriminator, but they are more complex.At the beginning, the network can only generate noise-like samples.After a period of training, the generator can generate more realistic samples, indicating that it has learned the distribution of data.Gao et al. [132] proposed deep convolution GAN to conduct semi-supervised learning.Two DCGANs discriminators were used for joint training.Experiments on MSTAR showed that it can obtain an accuracy of 98.14% with a 20% unlabeled rate.Zheng et al. [133] combined GAN with CNN to realize semi-supervised learning.They used GAN to generate labeled images.Label-smoothing regularization was also used.Experiments on MSTAR demonstrated the effectiveness of the method.Gao et al. [134] used more than one generator to realize stable semi-supervised GAN.The multi-classifier was used, and the labeled image was utilized in the training process, which shares the underlying layer with the discriminator.Then, the above layers were fine-tuned with little labeled SAR images to construct the recognition network of SAR images.It achieved 85.23%, 90.82% and 97.81% accuracy with 20%, 40% and 100% samples on MSTAR.El-darymli et al. [135] proposed a teacher-student semi-supervised method to train the model on a limited dataset.Firstly, the dataset was divided into consistent and confident unlabeled samples.Then, the student was used to generate the pseudo-labels.Finally, the pseudo-labeled, unlabeled data and labeled data were hybridized to train the model.Wang et al. [136] proposed a semi-supervised learning algorithms by a self-consistent enhancement rule, hybrid-based learning and loss learning.can utilize unlabeled data during training.Self-consistent enhancement rules force samples to share the same label.can balance the amount of labeled and unlabeled data.This could form the outstanding training effect of the supervised learning.Furthermore, it causes the network to obtain better performance.Then, they mixed labeled, unlabeled and enhanced samples, so that the labeled information could better participate in the mixed samples.The overall loss is the weighted summation of cross entropy loss and mean square error loss.Experiments on MSTAR and OpenSARShip datasets showed that it is close to supervised learning.Gao et al. [137] proposed semi-supervised classification algorithms based on attention and bias-variance resolving.The training set is represented by the dataset attention module.The uncontributed and difficult-to-learn unlabeled data will receive less attention.In the training phase, every unlabeled image is fed into the network for prediction.Treating pseudo-labels of unlabeled data as the most likely classification is good for prediction.It achieved 99.63% accuracy on 10-class MSTAR.Gao et al. [138] presented an active semi-supervised CNN algorithm.The active learning method was used to collect the most likely samples from the unlabeled dataset.The new regularization was also used for the loss function.The probability of unlabeled data was maximized by the above operations.The accuracy is 95.7% with only 236 labeled samples.Zhang et al. [139] presented a semi-supervised SAR ATR method.The labeled SAR images were used firstly to initialize the model.Then, the trained model was used to predict the labels of unlabeled images.After repeating the above steps, they could obtain a robust model.The trained model was used for producing predicted probabilities.The EM-based method was used to give the predicted labels at last.It achieved 99.83% accuracy on 10-class MSTAR.Tian et al. [140] proposed a multi-block mixed method for semi-supervised SAR ATR.A multi-block hybrid method was used to produce new SAR images to improve the accuracy.It achieved 99.67% accuracy with 80% labeled samples.Chen et al. [141] presented a semi-supervised algorithm based on consistency criterion and domain adaptation.Unlabeled data with weak enhancement and strong enhancement are used to predict the pseudo-label and train the model respectively.

Semi
Semi-supervision is a solution in the case of insufficient SAR samples.Although the performance of semi-supervision is worse than that of strong supervision, it still has certain prior research value.However, as SAR sensors become more common, and it becomes easier to collect large amounts of SAR data, there will be less room for such methods to work.

Metric Learning
For the M-class classification problem containing K training samples, the metric-learning method converts it into a classification task to determine whether two samples belong to the same category.The two samples belonging to the same class are combined into positive pairs, and the two samples belonging to different classes are combined into negative sample pairs.The number of positive and negative sample pairs is K (K − 1)/2, which is (K − 1)/2 times larger than the original dataset.Thus, it can use metric learning to alleviate the problem of limited samples.They are shown in Figure 18.Xu et al. [142] comprehensively verified the performance of distance metric learning on SAR ATR.There are four feature representations methods, and twenty distance metric learning algorithms were used.The results showed that the feature representation and distance-metric-learning algorithm are both important for SAR ATR.Pan et al. [143] presented a Siamese convolutional network method on a limited dataset.Firstly, it extracted features through a Siamese network.Secondly, features were extracted from the single branch network.Finally, the classifier was constructed to realize the recognition of specific types of targets.It achieved 93.20% accuracy with 30 categories.Reference [144] used a positive and negative sample pair strategy to expand the dataset.A Siamese CNN was designed to calculate the similarity.The weighted voting mechanism was applied to the Siamese CNN.The results showed that it is better than others on MSTAR and OpenSARShip datasets.Li et al. [145] conducted SAR ATR via CNN embedding and metric learning.Experiments on OpenSARShip and MSTAR verified the effectiveness of the method.Wang et al. [146] proposed contrast learning and pseudo-labels to recognize targets under limited samples.They used a Siamese structure to learn semantic representations of objects, and these features could reflect the similarity of SAR images.An iteratively varying loss function was used.It achieved 97.86% accuracy on 10-class MSTAR.
Metric learning has great potential in SAR ATR, but there are few research achievements.Due to the particularity of SAR images, it is necessary to systematically use the metric learning method, which is the direction that should be considered further.

Adding Domain Knowledge
The above work considers SAR target as a simple category; domain knowledge is ignored.In fact, domain knowledge is important information for recognition.The information contained in the target itself, such as length, width and height, is the knowledge.The forms of radar scattering characteristics, such as ASCs, amplitude and phase information, are also the knowledge.The papers those are about it are shown in Figure 19.Zhang et al. [147] pointed out the importance of domain knowledge in SAR ATR with limited samples.They took the aspect ratio and area of the SAR vehicle as domain knowledge.They used the domain knowledge information to correct output probability of the full convolutional model.Domain knowledge greatly alleviates the over-fitting problem caused by a small amount of data.Experiments on MSTAR showed that it can achieve 72.2% and 93.1%, respectively, under the condition of ten targets per class and thirty targets per class.Aiming at the problem of domain-adaptive SAR ATR, [148] proposed a deep knowledge integration framework.Deep knowledge transferring, multiple heterogeneous features projection and online learning were used to improve the performance.

Imbalance across Classes
Most of the datasets face the problem of imbalance across classes (also called long-tail distribution).When training CNN on these datasets, the majority of classes will dominate the training and degrade the performance.The accuracy of existing models will degrade.
The two ways to alleviate the problem are at the data-sampling level and the algorithm level.The data sampling technique makes the overall training data tend to be balanced.At the algorithm level, the phenomenon of "under-learning" is corrected by optimizing the loss function.A common solution is to increase the penalty of misclassification of fewer samples and reflect the cost in function, so that more "attention" can be paid to the classes with fewer samples.They can be divided into the following categories, as shown in Figure 20.Shao et al. [149] proposed in-batch balanced sampling and model fine-tuning for solving the imbalance problem.Firstly, the training set with known data imbalance was used as the source domain, and the target was rearranged and selected via in-batch balanced sampling.Secondly, the dataset was trained, and the weights of sample balance were saved.Finally, it was trained on the target dataset with unprocessed samples, and the weight network in the source domain was fine-tuned.Cao et al. [150] proposed a cost-sensitive awareness-based recognition model for solving the imbalance problem.At both the data level and the algorithm level, it can improve the performance and learn accurate boundaries.It achieved 90.4% accuracy on MSTAR.Zhang et al. [151] presented a class imbalance loss to tackle the imbalance dataset.The imbalance degree was used as the decision index factor.Yang et al. [152] proposed cascading expert branches and parallel expert branches to solve the imbalance problem.For cascading expert branches, experts are routed sequentially, and each expert uses the entire dataset for training so as to make better predictions for the head class.The parallel expert adopts the rebalancing method in the training process.It achieved 26.02% Top-1 accuracy on an NTIRE2021 SAR dataset.Zhang et al. [153] proposed a dynamic sampling and soft threshold to solve the imbalance problem.The dynamic weighted sampling rendered the distribution of the dataset more reasonable.Experimentation on OpenSARShip showed that it is better than traditional resampling methods.It obtained 80.58% and 77.5% accuracy in the VH and VV channel, respectively.Li et al. [154] presented a two-level jitter network to alleviate the imbalance problem.It decouples the process into representation and classification learning.

Imbalance across classes
The imbalance problem is a very common problem in SAR ATR and will seriously reduce the accuracy of the classification algorithm.The best approach is to spend a substantial amount of energy to collect data.However, due to the difficulty of this task, some data-level and algorithm-level methods still need to be adopted in the future to improve the performance.

Real-Time Recognition
At present, the commonly used CNN has high accuracy but faces a large number of layers, parameters and storage.It is difficult to implement in FPGA (Field Programmable Gate Array) or other embedded equipment hardware.This problem can be solved by designing a lightweight model and using model compression and acceleration, as shown in Figure 21.Model compression includes mainly network pruning, quantization, low-rank decomposition and knowledge distillation.They fall into the categories as shown in Figure 22.

Real-time Recogntion
Lightweight network  Reference [155] decomposed the traditional convolution into a cascade of per-channel convolution and per-pixel convolution to reduce the computational burden.Yu et al. [156] proposed a lightweight network called ASIR-Net.It includes channel-attention, channel-shuffle and inverted-residual.They are used to extract features with fewer parameters.Zhang et al. [157] presented a lightweight architecture.Pruning was conducted on the convolutional layer to obtain a lightweight network.Then, it was retrained by knowledge distillation.It achieved a reduction in model size by 344 times and a reduction in the by 18 times.Chen et al. [158] firstly used pruning and adaptive structure compression to accelerate the training and inference speed, and then, they quantified and coded the weights to further compress the model.The method achieved a 40-fold reduction in model scale and a 15-fold reduction in computational load without loss of classification accuracy.Min et al. [159] presented micro-CNN for real-time SAR classification.It had only two layers, and it was compressed from an 18-layer CNN via distillation.Weights of the models were either −1 or 1 or 0. The teacher network was DCNN, and the student network was MCNN.The gradual distillation shows better results than traditional knowledge distillation.MCNN was compressed 177 times but had similar accuracy when compared with DCNN.Zhong et al. [160] realized real-time recognition via transfer learning and model compression.The newly appended convolutional layer and global pooling layer were trained on an SAR dataset.Filter pruning was conducted to accelerate the speed.It achieved 3.6 times acceleration in testing with only a 1.42% decrease in the accuracy.Wang et al. [161] designed a lightweight model and compressed it via pruning and knowledge distillation.The convolution kernels with small attention values were pruned.It achieved 99.46% with only 10% parameters.
With the gradual improvement of the accuracy of SAR ATR, researchers have paid more attention to how to realize real-time target recognition on the end.The realization of real-time SAR ATR using model compression and acceleration technology is the key research direction in the future.There are considerable achievements in computer vision, which can provide reference for the development of this direction.

Polarimetric SAR
Compared with the single-channel SAR image, polarimetric SAR can capture more information through different combinations.Thus, researchers try to use polarized SAR images for recognizing targets as shown in Figure 23.Zhou et al. [162] converted the polarization covariance matrix into a six-dimensional feature vector.Then, the vector was fed into a network for classification.Then, the two joined convolutional layers were used.The results from the PolSAR (Polarimetric SAR) dataset showed the good performance.It achieved 92.46% accuracy with the 15-class Flevoland test site.Hou et al. [163] used multilayer auto-encoders and super-pixels to perform classification polarimetric SAR images.Pauli decomposition was used to generate super-pixels to use the spatial information firstly.The multilayer auto-encoder network was used.This network can use the pixel and spatial features of PolSAR images.It achieved 93.11% accuracy on a Flevoland four-look polarimetric AIRSAR.Gao et al. [164] proposed dual-branch CNN for PolSAR classification.It has two CNNs.One is responsible for extracting polarization features, and the other is for spatial features.The fully connected layer was used to combine them.It achieved 95.82% accuracy on RA-DARSAT-2 dataset.Adugna et al. [165] proposed a full convolution network.It used real valued weight kernels to classify complex-valued images by pixel.The results show that the method has higher accuracy in networks with the same structure.Hua et al. [166] designed a dual-channel CNN for PolSAR images.It includes two parallel CNN modules, which use two multi-scale convolution structures to extract different features.It achieved 82.58% accuracy on quad-polarized AIRSAR image.Li et al. [167] presented a complex multi-scale network for PolSAR classification.The complex CNN was defined for tackling PolSAR images.A multi-scale contourlet bank was used to extract discriminant features that were multi-directional, multi-scale and multi-resolution.The performance could be improved by substituting the filter of convolution.Experiments on PolSAR images showed that it is comparable to most advanced methods.It achieved 97.78% accuracy on the specific dataset.Xi et al. [168] proposed a fusion Siamese network for dual-polarized SAR ship classification.A two-stream Siamese network was used to combine the polarization SAR images.Fusion loss was used to improve the accuracy.The classification accuracy on the OpenSARship dataset reached 87.04%.Shang et al. [169] proposed a dense-connection, deeply separable CNN.The separable convolution can learn features of every channel.DSNet has deeply separable convolutions and dense connections, which can reduce parameters (decrease to less than 1/9) and improve accuracy.Zhang et al. [170] proposed an SE (squeeze-and-excitation) Laplacian pyramid network for dual-polarization SAR ATR.It had three parts: dual polarization feature fusion, SE and a Laplacian pyramid network.SE was used to model the channel and balance the contributions of polarization characteristics.A Laplacian pyramid network enables multi-resolution analysis.It achieved 56.66% accuracy on a six-class OpenSARShip dataset.Zeng et al. [171] presented a new CNN for ship classification of dual-polarized SAR.The network uses mixed-channel feature loss and combines the features in polarization channels.The results showed that it can effectively improve the classification performance.It achieved 82.42% accuracy on the OpenSARShip dataset.Xiong et al. [172] proposed dual-polarimetric SAR ship classification algorithms.The dual-channel loss can fuse features and render the model more fit for dual-polarized images.Results showed that it obtains 87.72%, which is 3.72% higher than the traditional method.

Polarimetric
Polarimetric SAR data contain more information than amplitude data, but how to use this information to improve the performance of ATR is the direction that needs to be focused on in the future.

Complex Data
For SAR sensors, more information is contained in complex data.The unique phase information of an SAR image is inaccessible to other sensors.However, most CNN-based methods tackle only amplitude data and ignore complex data.Therefore, it is necessary to develop an accurate recognition algorithm by extracting the complex features.The papers those are about it are shown in Figure 24.Zhang et al. [173] proposed a polarization fusion network with geometric feature embedding (PFGFE-Net).PFGFE-Net achieves the polarization fusion from the input data, feature-level, and decision-level.Moreover, the geometric feature embedding enriches expert experience.Results on OpenSARShip reveal PFGFE-Net's excellent performance.Scarnati et al. [174] reviewed the complex neural network techniques on SAR ATR.They commented on the merits and the accuracy of each technique.Zhang et al. [175] proposed a complex valued CNN for extending CNN to the complex domain.The CNN includes input-output, convolution and the pooling layer.Taking complex data as input, each layer of the network can transmit phase information.The results on polarization SAR image classification showed that it has better performance than the conventional method.It achieved 99% accuracy on a Flevoland dataset with 14 classes.Sun et al. [176] presented a complex-valued model.They introduced the complex-valued operations.The SE module was also used to weight the feature maps.Results on MSTAR showed that it achieves 98.97% accuracy, which is higher than real-valued CNN algorithms.Wang et al. [177] presented a complex-valued CNN, in which the amplitude and phase information are fully utilized.Experiments demonstrated that it is better than traditional real-valued convolutional neural networks.Zeng et al. [178] proposed multi-stream complex-valued networks to use the phase of SAR images.The complex-valued operations were constructed, for example, complex convolution, complex batch normalization, complex activation, complex pooling and complex fully connected layers.Experiments on MSTAR showed that it can obtain better results.Hou et al. [179] proposed a complex online learning network.They believed that the amplitude and phase are important discriminators for recognition.They modeled SAR images by establishing a complex Gaussian distribution model in dictionary learning.Then, a dictionary of the distributed model was learned.Experiments on MSTAR showed that it obtains an accuracy of 94.52% with 20% samples.

Complex data
Complex data contain more information than amplitude data, but how to tackle this information requires more research in the future.

Others
Besides the above direction, there are four other directions which are also studied by researchers, as shown in Figure 25.They are the usage of attributed scattering center, combining traditional features with CNN, explainable and adversarial attack.They are reviewed as follows.Current SAR ATR algorithms are aimed at amplitude of SAR image (data-driven network).Thus, the model of physics is not utilized so much.As the ASC can describe the characteristics and physical structure information of the target, it is necessary to use them in ATR.When the radar works in high frequency, the scattering field of the target can be approximated as the accumulation of its scattering center.A series of parameters can describe the characteristics of each scattering center.The parameters contain rich physical and geometric properties, which can accurately describe the real scattering mechanism of the target.The parameter set of ASC is in the form of point set.The CNN is not suitable for directly processing point set data.Many studies have been performed to fuse CNN and ASC as shown in Figure 26.Feng et al. [180] integrated a parts model and deep learning to render the method more interpretable and powerful.It was computed via ASC.The local features were from the parts model.It achieved 99.79% accuracy on MSTAR SOC.Liu et al. [181] proposed SDF-Net to fuse physical knowledge and deep features.The physical knowledge was represented by the ASC data.Experiments on MSATR showed the effectiveness and robustness of it.Li et al. [182] also combined electromagnetic-scattering information and a graph convolutional network.They modeled every scattering center to convert them into a graph.The graph was used to represent the structure features.Jiang et al. [183] also combined a CNN and ASC.The test sample was processed by the CNN firstly.If the output is not reliable, the ASC matching will further identify it.It achieved 99.41% accuracy under MSTAR SOC.Li et al. [184] proposed an ASCM and a discriminative dictionary learning method.It has three steps.The low-level local features, the label-consistent discriminative dictionary learning and the spatial-pyramid matching were used to make full use of SAR images and ASCs.Zhang et al. [185] fused scattering center features and CNN models.The ASCs are extracted from complex SAR data.A modified VGG-Net was adopted to extract deep features in SAR images.Discrimination correlation analysis was used to fuse the features.Zhang et al. [186] proposed an attributed scattering-center-matching-based noise-robust recognition method.The ASCs are extracted based on sparse representation.A Hungarian algorithm is adopted to pair the template ASC sets.It achieved 97.54% accuracy under MSTAR SOC.

Combining the Traditional Features with CNN
A CNN has shown better accuracy than traditional hand-crafted features.However, the traditional features have been developed by experts, who can support their inter-pretability.Thus, many researchers seek to combine both of them.Zhang et al. [187] tried to inject the traditional features into CNN to improve the performance of SAR ATR.They assumed that the traditional features can improve the classification performance further.The HOG, NGFs, LRCS and PAFs were used.They can be injected at the convolutional, residual, dense blocks and FC Layer.Furthermore, the CNN main body was unchanged.The researchers also used the seven methods to perform the injection.Results showed that the accuracy improved by 6.75% after the injection.Zhang et al. [188] also proposed another method to integrate traditional features into CNN.The edge, Harris, and HOG features were used.The classical CNNs were also used.Experiments showed that the integrations have substantial progress in the accuracy.

Explainable
A CNN can mimic the human brain and is able to extract features automatically.It has shown good results in SAR ATR.However, it works similarly to a black box; the transparency is not clear enough.This would lead to security risks and reduce the trust in the algorithms.Thus, many researchers try to explain CNNs in SAR ATR.Mandeep et al. [189] used an explainable artificial intelligence system to verify the trained CNN model.It explains the test images by marking the decision boundary.It is a transparent learning method.Guo et al. [190] explained SAR ATR via model understanding, diagnosis and so on.Feng et al. [191] proposed a method to visualize the SAR ATR model.It assigns a pixel-wise weight matrix to different channels.Li et al. [192] proposed SAR-BagNet for SAR ATR.SAR-BagNet can show a heat-map which reflects the contribution of each part.Research in this direction is necessary.It can improve the validity of AI systems.

Adversarial Attack
Though deep-learning-based SAR ATR methods show good performance, they are easily attacked by adversarial samples.These samples can cause CNN to output the intended wrong labels by adding some perturbation.Some researchers have studied this problem in recent years.Huang et al. [193] used several methods to demonstrate that CNN is easily attacked by adversarial examples.Sun et al. [194] also conducted a detailed adversarial robustness evaluation of CNN-based SAR ATR.Seven different adversarial perturbations were used for generating adversarial samples.The adversarial average recognition accuracy was used as the evaluation.Du et al. [195] built a UNet-generative adversarial network to generate adversarial examples.The experiments showed that a high-quality adversarial example has good attack results.Zhang et al. [196] proposed an SAR-characteristic-based adversarial deception method.The perturbations have better results than other methods.Peng et al. [197] proposed a speckle-variant attack method.It consists of two parts: an iterative gradient-based generator and a region extractor.It is easy to generate good adversarial examples.The above work shows that the deep learning used in SAR ATR is very easy to attack.This is one of the disadvantages of deep learning.We should consider this problem when designing SAR ATR systems in real working conditions.

Future Directions
Recently, deep learning has dominated all tasks, for example, detection, recognition and segmentation.Due to this, SAR ATR researchers also use deep learning here, and a large number of methods have emerged recently.However, compared with computer vision, deep learning-based SAR ATR still faces many problems, which need to be further solved.They fall into the following classes, as shown in Figure 27.

The Dataset
Compared with optical images, SAR images are more sensitive to imaging parameters and observation attitude.The same target exhibits more diversity in them.Therefore, it is needed to establish a recognition dataset larger than the optical image dataset.However, it is difficult to obtain SAR images, which results in small datasets.This contradiction between supply and demand renders SAR ATR more difficult.
In the future, researchers need to consider constructing the dataset.They should realize the difficulty and importance of it and be willing to cover substantial costs to achieve this.When doing this, the imbalance problem should also be considered.Data augmentation can be used as a supplement to the lack of a large SAR ATR dataset.Other than these, how to design weakly supervised or unsupervised learning algorithms with few samples should also be studied further.

CNN Architecture Designing
At present, the CNN used in SAR ATR is partly specialized and partly borrowed from computer vision.Both of them have their advantages and disadvantages.The specially designed CNN can fully consider the merits of SAR ATR, and the CNN borrowed from computer vision can have a strong feature extraction ability.Therefore, it is needed to explicitly unify the two ideas when designing CNN architecture.What is more, the number of channels and the number of parameters should also be considered.It should also maximize the intra-class compactness and the inter-class separation simultaneously.

Knowledge-Driven Dataset
For SAR ATR, most of the current work is focused on the image itself (which is data-driven), and some knowledge (motion features, geometric features, scattering features, etc.) is ignored.In fact, the knowledge is also critical for recognition.Thus, we should integrate the knowledge into the CNN and further improve the recognition accuracy.The premise of research in this direction is to establish a knowledge dataset, which is relatively difficult to achieve.

Real-Time Recognition
With the maturity of CNN-based SAR ATR in recent years, the demand for real-time application deployment is becoming increasingly urgent.Lightweight CNN structure design, model compression and acceleration, and hardware deployment are the key technologies to achieve real-time recognition, which need to be focused on in the next step.It should be noted that the lightweight networks with extensive depthwise and pointwise convolution will not have a fast speed.As these operations are not optimized on the hardware, they should be used less.

Explainable and Adversarial Attack
Although the CNN has shown great advantages in SAR ATR, its working mechanism is not transparent.Furthermore, it is in a black box working state.The future work should aim to improve the interpretability of CNN.This can help people understand how the deep-learning model learns, what it learns from the data, why it makes such decisions for each input sample, and whether its decisions are reliable.
The CNN is vulnerable to the attack of counterattack samples.If the input is slightly modified, the network can give different results.The characteristics of the target in the radar image are affected by many factors.If the robustness of deep learning is insufficient, it is difficult to apply it to the actual scene.Thus, in the future, we need to focus on improving its resistance to counterattack.

Conclusions
This paper gives a comprehensive survey of SAR ATR.The datasets and the evaluation metrics were introduced firstly.The problems of limited samples and unbalanced distribution were also pointed out.Secondly, the traditional ATR methods, including template-matching-based, machine-learning-based and model-based methods, were introduced in that order.The machine-learning-based methods now show popularity in this area.Thirdly, the deep-learning-based methods were introduced thoroughly.This part is also the core of the paper.The non-CNN models and the CNN models were reviewed at the beginning.Then, the methods to solve the limited samples including data augmentation, GAN, electromagnetic simulation, transfer learning, few-shot learning, semi-supervised learning, metric learning and domain knowledge were surveyed in detail.After that, the imbalance problem, the real-time recognition, the polarimetric SAR, the complex data, the attributed scattering center, the adversarial attack and the explainable were surveyed thoroughly and in that order.Thirdly, the future directions of SAR ATR were introduced.In the future, we should construct a massive dataset, designing specialized CNN, adding knowledge to CNN, realizing real-time recognition and improving explainable and robustness to adversarial attack.To the best of our knowledge, this work represents the first comprehensive review of the research in the field of deep-learning techniques used for SAR ATR.

Figure 1 .
Figure 1.The process of SAR ATR.

Figure 2 .
Figure 2. The framework of the paper.

Figure 4 .
Figure 4.The datasets used in SAR ATR.MSTAR is the first public dataset constructed by DARPA (Defense Advanced Research Projects Agency).It contains 10 categories of former Soviet military vehicles.The data are collected via X-band SAR, the imaging mode is spotlight, the polarization mode is HH, and the resolution is 0.3 m × 0.3 m.The angle is 0 degree to 360 degree; the angle interval is 3 degrees; and the size is 128 × 128.It has 120 slices.MSTAR includes SOC (Standard Operating Condition) and EOC (Extended Operating Condition).SOC represents that the elevation angle and azimuth angle are different.EOC refers to the large difference between test and training set, mainly in the large change in elevation angle, configuration and the different models of the same type.OpenSARShip was constructed by Shanghai Jiaotong University.The information of OpenSARShip is shown in Table2.It contains common types of civilian ships.The ships

Figure 5 .
Figure 5.The confusion matrix.From this figure, we can understand the concept of TP (True Positive), FP (False Positive), FN (False Negative) and TN (True Negative).Based on the confusion matrix, the false positive rate and the true positive rate are calculated as follows:

Figure 7 .
Figure 7.The process of machine-learning-based SAR ATR algorithms.
proposed an improved CNN called Q-Net based on the characteristics of SAR images.The experiments are conducted on MSTAR.Q-Net has only three convolutional layers, which are very shallow compared with the classical CNNs.It achieved 97.58% accuracy on three-class MSTAR and 97.32% accuracy on ten-class MSTAR.Zhou et al. [52] used large-margin softmax and batch-normalization based CNN to increase the separability of samples.It has only four convolutional layers.The experiments conducted on MSTAR showed the robustness of the classifier.It achieved 96.44% accuracy on 10-class MSTAR.Cho et al. [53] proposed a two-way feature additional CNN for considering the pose information of the target.The two-way features are aggregated and input into the fully-connected layers.The CNN that they used has seven convolutional layers.It achieved 94.38% accuracy on MSTAR.Zhao et al. [54] used multi-stream CNN for solving the problem of limited data.It has only four convolutional layers.The multiple views of the same target are input to MS-CNN.The experiments conducted on MSTAR SOC and EOC showed the superiority of the method.It achieved 99.92% accuracy on 10-class MSTAR under SOC.Lang et al. [55] presented LW-CMDANet.It designs a four-layer CNN model combined with hinge loss.It achieved 92.98% accuracy on 10-class MSTAR.

Figure 11 .
Figure 11.Methods to solve the problem raised by limited samples.

Figure 21 .
Figure 21.Lightweight CNN, model compression and acceleration are two ways to implement real-time recognition.

Figure 22 .
Figure 22.The real-time recognition in SAR ATR.

Author Contributions:Funding:
Conceptualization, J.L. and Z.Y.; methodology, J.C. and L.Y.; investigation, C.C.; writing-original draft preparation, J.L.; writing-review and editing, J.L..; supervision, Z.Y.; funding acquisition, P.C.All authors have read and agreed to the published version of the manuscript.This research received no external funding Data Availability Statement: No new data were created in this paper.

Table 1 .
The number of the traditional and deep-learning-based SAR ATR papers.

Table 2 .
The statistics of OpenSARShip and OpenSARShip 2.0.OpenSARUrban is used for the interpretation of urban SAR images.It provides 33,358 patches covering 21 major cities.It can be used for urban target classification and content-based image retrieval.
Shah et al. [74] adopted a capsule network for SAR ATR.It has one convolutional and two capsule layers.The demand of training data is small.It has an accuracy of 98.14% on MSTAR.Yang et al. [75] combined the dilated convolution and a capsule network to SAR ATR.They are less hungry to training samples.It achieved 97.15% accuracy on 10-class MSTAR.Guo et al. [76] used a capsule network for high accuracy recognition.It can connect every target in an SAR image.It is learned through full connected operation that is vector-based.It shows superior robustness compared with CNN.Ren et al.