1. Introduction
As a longstanding and challenging problem in Synthetic Aperture Radar (SAR) imagery interpretation, SAR Automatic Target Recognition (SAR ATR) has been an active research field for several decades. SAR ATR plays a fundamental role in various civil applications including prospecting and surveillance, and military applications such as border security [
1]. (Armored) vehicle recognition [
2,
3,
4] in SAR ATR aims at giving machines the capability of automatically identifying the classes of interested armored vehicles (such as tank, artillery and truck), which is the focus of this work. Recently, high-resolution SAR images are increasingly easier to produce than before, offering great potential for studying fine-grained, detailed SAR vehicle recognition. Despite decades of effort by researchers, including the recent successful preliminary attempts presented by deep learning [
5,
6,
7,
8], as far as we know, the problem of SAR vehicle recognition remains an underexploited research field with the following significant challenges [
9].
The lack of large, realistic, labeled datasets. Existing SAR vehicle datasets, i.e., the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset [
10], are too small and relatively unrealistic, and cannot represent the complex characteristics of SAR vehicles [
1] including imaging geometry, background clutter, occlusions, and speckle noise and true data distributions, but are very easy for many machine-learning methods to achieve high performance with abundant training samples. Certainly, such SAR vehicle datasets are hard to create due to the non-cooperative application scenario and the high cost of expert annotations. Therefore, label-efficient learning methods deserve attention in such a context. In other words, in the recognition missions of SAR vehicle targets, labeled SAR images of armored vehicles are usually difficult to obtain and interpret in practice, which leads to an insufficient sample situation in this field [
11].
Large intra-class variations and small inter-class variations. Variations in imaging geometry, such as the imaging angle including azimuth angle and depression angle, imaging distance and background clutter, lead to remarkable effects on the vehicle appearance in SAR images (examples shown in
Figure 1), causing large intra-class variation. The aforementioned variations in imaging conditions can also cause vehicles of different classes to manifest highly similar appearances (examples shown in
Figure 1), leading to small inter-class variations. Thus, SAR vehicle recognition demands robust yet highly discriminative representations that are difficult to learn, especially from a few labeled samples.
The more difficult recognition missions among extended standard operation (EOCs). In MSTAR standard operation condition (SOC), the training samples and testing samples are only different in the depression angles, which are and . When it comes to EOCs, different from the SOC, the variations in the depression angles and the configuration or versions of targets lead to obvious imaging behaviors among SAR targets. Thus, the recognition missions among EOCs are much more difficult than SOC in the MSTAR dataset. This phenomenon also exists in the few-shot recognition missions.
Recently, in response to the aforementioned challenges, FSL [
12] has been introduced to recognition missions of SAR ATR, aiming to elevate the recognition rate through a few labeled data. The lack of training data suppresses the performance of those CNN-based SAR target classification methods, which achieve a high recognition accuracy when the labeled data are sufficient [
2]. To handle this challenge, simulated SAR images generated from auto-CAD models and the mechanism of electromagnetic scattering are introduced into the SAR ATR to elevate the recognition accuracy [
13,
14,
15]. Although some common information can be transferred from labeled simulated data, there still exists huge differences between simulated data and measured data. The surroundings of the imaging target, the disturbance of the imaging platform, and even the material of vehicles make it hard to simulate the samples in the real environment. Because of this, some scholars are willing to leverage unlabeled measured data, instead of simulated data, in their algorithms, which launches the settings of semi-supervised SAR ATR [
16,
17,
18,
19].
Building upon our previous study in [
11], this paper presents the first study of SSFSL in the field of SAR ATR, aiming to improve the model by making use of labeled simulated data and unlabeled measured data. Besides leveraging these data, information on azimuth angle is regarded as a kind of significant knowledge in digging discriminative representation in this paper.
When there are enough labeled SAR training samples, the feature-embedding space based on the azimuth angle of a category is approximately complete from
to
. Hence, under this situation, the influence of a lack of several azimuth angles on recognition rate is limited. Nevertheless, if there are only an extremely small number of labeled samples, their azimuth angle will dominate the SAR vehicle recognition results.
Figure 1 shows four selected categories of SAR images from the MSTAR SOC within the azimuth-angle normalization [
11]. It is obvious that the SAR vehicle images with huge differences in azimuth angles from the same category own quite different backscattering behaviors, which can be considered to be high intra-class diversity. When the difference of the azimuth angles of samples is over 50 degrees, the backscattering behaviors, including the shadow area and target area of the target, are dissimilar in accordance with the samples in the same row of
Figure 1. In the meantime, the SAR vehicle images with the same or adjacent azimuth angles from different categories share similar backscattering behaviors, which is the inter-class similarity. The samples in the same column in
Figure 1 are homologous in the appearance of the target area and shadow area, especially when vehicle types are approximate; for instance, the group of BTR60 and BTR70, and T62 and T72. These two properties among SAR images cause confusion in representation learning and mistakes in classification results.
To solve this problem, an azimuth-aware discriminative representation (AADR) learning method is proposed, and this algorithm can grasp the distinguishable information through azimuth angles among both labeled simulated data and unlabeled measured data. The motivation of the method is to design a specific loss to let the model study not only the category information but also the azimuth-angle information. For suppressing the intra-class diversity, the pairs of SAR samples from the same category within huge azimuth-angle differences are selected, and their absolute value of cosine similarity of representations will be adjusted from zero-near value to one-near value. Simultaneously, to enlarge inter-class differences, samples from different categories with the same azimuth angle are selected and their feature vectors will be pulled from approximately overlap to near orthogonality in the metric manner of cosine similarity. Following this idea, the azimuth-aware regular loss (AADR-r) and its variant azimuth-aware triplet loss (AADR-t) are proposed, and the details will be introduced in
Section 3. Furthermore, the cross-entropy loss from the labeled simulated datasets and the KL divergence of pseudo-labels from the unlabeled measured dataset (MSTAR) are also considered in the proposed loss. After experiencing the modification through the proposed loss, the algorithm is used to learn the discriminative representation from the few-shot samples and be tested among the query set.
Based on the baseline in SSFSL, there is no overlap between categories among the source domain and the target domain. The number of simulated data in the source domain is abundant, whereas there are an extremely small number of measured samples with labels and enough unlabeled measured data in the target domain. According to the settings of SSFSL, samples from the support and query sets are distinguished by different depression angles, and the unlabeled data are only chosen from the samples in the support set.
Extensive contrast experiments and ablation experiments were carried out to show the performance of our method. In general, the three contributions of the paper are summarized below:
Due to the lack of large and realistic labeled datasets among SAR vehicle targets, for the first time, we propose the settings of semi-supervised few-shot SAR vehicle recognition, which takes both unlabeled measured data and labeled simulated data into consideration. In particular, simulated datasets act as the source domain in FSL, while the measured dataset MSTAR serves as the target domain. Additionally, the unlabeled data in MSTAR dataset are available in the process of model training. This configuration is really close to the active task in few-shot SAR vehicle recognition that labeled simulated data, and unlabeled measured data can be obtained easily.
An azimuth-aware discriminative representation loss is proposed to learn the similarity of representations of intra-class samples with large azimuth-angle differences among the labeled simulated datasets. The representation pairs are considered to be feature vector pairs, which are pulled close to each other in the direction of the vector. Meanwhile, the inter-class differences of samples with the same azimuth angle are also expanded by the proposed loss in the feature-embedding space. The well-designed cosine similarity works as the distance to make representation pairs in the inter-class be orthogonal to each other.
tlo information and phase data knowledge are adopted in the stage of SAR vehicle data pre-processing. Moreover, the variants of azimuth-aware discriminative representation loss achieve 47.7% (10-way 1-shot SOC), 71.05% (4-way 1-shot EOC1), 86.09% (4-way 1-shot EOC2/C), and 66.63% (4-way 1-shot EOC2/V), individually. Plenty of contrast experiments with other FSL methods and SSFSL methods prove that our proposed method is effective, especially in three EOC datasets.
There are five sections in this paper. In
Section 2, the semi-supervised learning and its applications in SAR ATR, FSL and its applications in SAR ATR, and SAR target recognition based on azimuth angle are introduced in the related work. The settings of SSFSL among SAR target classification is presented in
Section 3.1. Then, in
Section 3.2, the whole framework of the proposed AADR-r is shown. After that, AADR-t is described in
Section 3.3. Then, in
Section 4, experimental results among SOC and three EOCs are demonstrated in diagrams and tables. Sufficient contrast experiments, ablation experiments, and implementation details are introduced and analyzed in
Section 5. Finally, this paper is concluded, and future work is designed in
Section 6.
4. Experiments
To test the validity of AADR for semi-supervised few-shot SAR vehicle classification, extensive experiments were performed under the experimental settings that the public simulated SARSIM dataset and the simulated part of Synthetic and Measured Paired Labeled Experiment (SAMPLE) dataset were combined as the . The public MSTAR dataset was recognized as the . Actually, the few-shot labeled data are sampled from , and the data in the query set are from different depression angles or different types. The unlabeled data are all from . Take MSTAR SOC (Standard Operating Condition) as an example: the few-shot labeled data comprise the support set, while the unlabeled measured data are selected from the set of depression angle in MSTAR SOC, while the samples within depression angle in MSTAR SOC compose the query set. Contrast experiments with traditional classifiers, other advanced FSL approaches, and semi-supervised learning approaches were conducted. Additionally, ablation experiments with different dimensions of features, various base datasets, and errors in azimuth-angle estimation are also involved in our work.
Without the phase data, the data in the SAMPLE dataset and the SARSIM dataset only experience the gray-image adjustment and azimuth-angle normalization. However, the samples in the MSTAR database experience the phase data augmentation as in [
11].
The feature extractor network in
Figure 2 contains four fully convolutional blocks [
64], which own a 3 × 3 convolution layer with 64, 128, 256 and 512 filters, relatively, 2 × 2 max-pooling layer, a batch-normalization layer, and a RELU (0.5) nonlinearity layer. We use the SGD with momentum optimizer with momentum 0.9 and weight decay
. All experiments were run on a PC with an Intel single-core i9 CPU, four Nvidia GTX-2080 Ti GPUs (12 GB VRAM each), and 128 GB RAM. The PC operating system was Ubuntu 20.04. All experiments were conducted using the Python language on the PyTorch deep-learning framework and CUDA 10.2 toolkit.
4.1. Datasets
(1). SARSIM: The public SARSIM dataset [
15] contains seven kinds of vehicles (humvee 9657 and 3663, bulldozer 13,013 and 8020, tank 65,047 and 86,347, bus 30,726 and 55,473, motorbike 3972 and 3651_Suzuki, Toyota car and Peugeot 607, and truck 2107 and 2096). Every image is simulated in the identical situation to MSTAR and for
azimuth-angle interval at the following depression angles (
,
,
,
,
,
, and
), so there are 72 samples in each category under a certain depression angle.
(2). SAMPLE: The public SAMPLE dataset [
65,
66] is released by Air Force Research Laboratory with both measured and simulated data in 10 sorts of armored vehicle (tracked cargo carrier: M548; military truck: M35; wheeled armored transport vehicle: BTR70; self-propelled artillery: ZSU-23-4; tanks: T-72, M1, and M60; tracked infantry fighting vehicle: BMP2 and M2; self-propelled howitzer: 2S1). The azimuth angles of the samples, which are 128 × 128 pixel, in the SAMPLE dataset are from
to
and their depression angles are from
to
. For every measured target, a corresponding synthetic image is created with the same sensor and target configurations, but with totally different background clutter. In order to make the categories in the few-shot recognition stage and pre-training stage different, in most experiment settings in this article, only the synthetic images in the SAMPLE dataset are leveraged and combined with the SARSIM dataset to expand the richness of categories in the base dataset.
(3). MSTAR: In recent years, the MSTAR SOC dataset [
10], including ten kinds of military vehicles during the Soviet era (military truck: ZIL-131; tanks: T-72 and T-62; bulldozer: D7; wheeled armored transport vehicle: BTR60 and BTR70; self-propelled howitzer: 2S1; tracked infantry fighting vehicle: BMP2; self-propelled artillery: ZSU-23-4; armored reconnaissance vehicle: BRDM2), was remarkable for verifying the algorithm performance among SAR vehicle classification missions. Imaged under the airborne X-band radar, the samples in this dataset were HH polarization mode within the resolution of 0.3 × 0.3 m. Targets, whose depression angles were
, were for the support set and consisted of the unlabeled measured data, and
were for testing, whose numbers among each category were shown in
Table 1. The EOC1 (large depression variation) contained four kinds of target (ZSU-23-4, T-72, BRDM-2 and 2S1). The depression angle of the training and testing set were
and
, relatively. The targets in the EOC2/C (configuration variation) were various in parts of the vehicle, including explosive reactive armor (ERA) and an auxiliary gasoline tank. The EOC2/V (version variation) corresponded to the target version variation and shared the identical support set to the EOC2/C, but with a different query set, which is displayed in
Table 2.
4.2. Experimental Results
4.2.1. Experiments in SOC
Comparative experiments including classical classifiers (CC), FSL methods and SSFSL methods are shown in
Table 3, under the FSL setting among 10-way
K-shot (
). The average recognition rate and variance of 600 random experiments for each setting are displayed in
Table 3. CC algorithms include LR (logistic regression) [
67], DT (decision tree) [
68], SVM (support vector machine) [
69], GBC (gradient-boosting classifier) [
70] and RF (random forest) [
71]. These methods share the same feature extractors as the AADR with individual classifiers. The average recognition rate of algorithms in SSFSL is higher than in FSL and CC in
Table 3. Although the recognition rates of classical classifiers are unsatisfactory in few-shot conditions, some of them achieve a higher result than SSFSL in the settings of 10-way 10-shot. Our proposed AADR-r and AADR-t obtain a relatively better recognition rate in few-shot settings (
), which are only a little lower than DKTS-N. The DKTS-N outstrips all the other methods in the settings of both few-shot and limited data in SOC for the following reason. The advantage of DKTS-N is learning the global and local features. The samples in training and testing sets in MSTAR SOC are similar because of the approximate depression angle
and
. Hence, the global and local features between the two sets are close and easy to be matched through Earth’s mover distance and nearest-neighbor classifiers in DKTS-N. However, highly different configurations and versions of armored vehicle lead to huge discrepancies in local features, which influence the scattering characteristics among SAR images. Therefore, the performance in the EOCs of DKTS-N decreases, which is the restriction of this metric-learning-based algorithm. The proposed AADR, an optimization-based method, overcomes the difficulties and shows an overwhelming performance in EOCs.
4.2.2. Experiments in EOCs
Due to the huge differences among SAR vehicle images, the FSL missions are harder in EOCs than in SOC. However, most of the SSFSL methods are better than FSL methods in the results of both SOC setting and EOCs settings, which means the usage of unlabeled data is beneficial for the FSL among SAR vehicles. In addition, the awareness of the azimuth angle also helps the model to grasp the important domain knowledge among SAR vehicles and overcome the intra-class diversity and inter-class similarity in few-shot conditions. From
Table 4, it is obvious that our proposed AADR-r and AADR-t do a good job in EOCs, and the recognition results are much higher than other FSL methods and SSFSL methods. Instead of comparing the metric distances between the features, the model optimization through designed loss performs well in a large difference in depression angle, vehicle version, and configuration.
A similar process with different losses causes different results such that the accuracy of AADR-r exceeds AADR-t in most times. In fact, the categories of the anchor sample, hard negative sample, and hard positive sample among unlabeled data are generated by the trained model in the pre-training stage, according to
Figure 2. Thus, the pseudo-labels participate in the loss and influence the result. For instance, the anchor sample and its hard negative sample are from the different categories, which are the pseudo-labels among the simulated data. However, if these two samples are from the same actual category in
, this will lead to the wrong training in the second stage. Therefore, the results of AADR-t contain more uncertainties than AADR-r.
6. Conclusions
To sum up, we put forward the AADR to deal with the task of few-shot SAR target classification, especially in the situations of a huge difference between support sets and query sets. The use of unlabeled measured data and labeled simulated data are one of the key means to elevate the recognition rate in a fresh semi-supervised manner. Additionally, azimuth-aware discriminative representation learning is also an available way to cope with the intra-class diversity and inter-class similarity among vehicle samples. In general, a large number of experiments showed that AADR was more impressive than other FSL algorithms.
There are still some flaws in the proposed methods. First, due to the optimization-based design, the classifier of the fully connected layer in AADR is not pleasant when the number of labeled data is over 10. According to
Figure 3, as the number increases, the elevation of performance is limited. Hence, how to use more labeled data is significant to making the AADR powerful in situations of both few-shot and limited data. Second, the hyper-parameter
in the loss, which indicates the proportion of azimuth-aware module, is fixed in the current algorithm. From the results, it is hard to determine a certain value of
that can fit four experiments. Thus, a self-adaptation
in the loss, related to the training epochs and learning rate, can guide the gradient descent in a better way.