1. Introduction
Synthetic aperture radar (SAR) data have been widely used in many fields, such as military surveillance, disaster warning, and environmental monitoring. SAR automatic target recognition (SAR-ATR) technology, which aims to recognize the target types of SAR images, has been comprehensively studied in the literature. With the popularity of deep learning methods in recent years, they have been effectively used for SAR-ATR, and a large amount of training data are usually needed to ensure the reliability of the trained deep models. However, due to the particularity of SAR images, the cost of acquiring a large amount of labeled SAR data is high, and a lack of training data will influence the recognition performance of the deep models [
1,
2,
3,
4,
5].
Many research works have been conducted on SAR target recognition regarding cases of insufficient training data or few-shot learning [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. In general, these methods can be divided into the following categories: (1) Data augmentation can be used to increase the number of training samples. There are traditional data augmentation methods, such as the translation of the target and adding random speckle noise used in [
6,
7], and deep generative methods, such as generating images by Wasserstein generative adversarial networks with a gradient penalty [
12] or by adversarial autoencoder [
13]. (2) More robust features can be designed, which helps to improve recognition accuracy. There are traditional features, such as the sparse representation of the monogenic signal [
8] and modified polar mapping [
9], as well as features extracted by deep networks, such as deep highway unit [
10], attribute-guided multi-scale prototypical network [
11], angular rotation generative networks [
5], and convolutional transformers [
16]. (3) Transfer learning methods can be applied [
3,
4,
5]. In this situation, auxiliary datasets can be used for transferring relevant knowledge [
5] or pre-training the network [
3,
4]. (4) Other methods, such as semi-supervised learning methods [
3,
15,
17,
18,
19,
20,
21] or meta-learning methods [
11,
14], can also be applied to the few-shot learning problem. The above methods can be combined for better results when the training data is scant.
Because the data acquisition of measured (real) SAR data is usually difficult, a growing number of studies [
1,
2,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36] have been conducted that utilize synthetic (simulated) SAR data for the recognition problem. The Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset [
37] provided by the Air Force Research Laboratory (AFRL), USA, has been widely used in the development of SAR-ATR algorithms over the past two decades. A dataset called the Synthetic and Measured Paired Labeled Experiment (SAMPLE) has also been introduced by AFRL to solve synthetic SAR target recognition problems [
24]. In our paper, we conduct experiments on the SAMPLE dataset.
The research works based on the synthetic SAR data for recognition problems can be roughly divided into these main categories: (1) Pre-training: using plentiful synthetic SAR data to pre-train the classifier, and then using a small amount of measured SAR data to fine-tune the pre-trained network [
1]; (2) Transformation or augmenting: transforming or augmenting the synthetic data to train the classifier to achieve better recognition results [
2,
22,
23,
24,
25,
26,
29,
34,
35]; (3) Designing features: designing more general features [
28,
30,
32,
33,
36] for training, such as ensembling traditional features [
28], features based on scattering centers [
32], or multi-scale deep features [
36]. (4) Domain adaptation methods: adding some domain adaptation constraints for optimization, which makes the trained classifier more suitable for measured data [
27,
36]; (5) Other methods: such as neural architecture searching [
31], or exploring the effectiveness of the existing methods and integrating some effective methods [
25].
Considering the research works that use synthetic SAR data for recognition, there are two main recognition situations. First recognition situation: large and small amounts of synthetic and measured data, respectively, are labeled for training [
1,
2,
22,
29,
35]. Second recognition situation: there are only synthetic data with labels for training, and the measured data of the same categories as the synthetic data are used as the test data [
23,
24,
25,
26,
27,
28,
30,
31,
32,
33,
34,
36]. The second recognition situation is more challenging. In our paper, we also consider the second recognition situation which is training fully on the synthetic SAR data and testing on the measured data, and the main challenge when training fully on the synthetic SAR data and testing on the measured data is the distribution gap between the training and test data.
In this paper, we design a gradual domain adaptation recognition framework with pseudo-label denoising to solve the distribution gap problem and achieve state-of-the-art recognition performance using the SAMPLE dataset. The existing methods [
27,
36] based on the domain adaptation idea only consider the feature alignment between different domains, while we utilize the self-training method further to narrow the distribution difference of the training data and test data for each category.
Figure 1 illustrates the gradual domain adaptation process which contains two stages, i.e., feature alignment classification and self-training. The two stages are explained as follows. For the first stage, there is a domain shift between the original features of the training data and the test data, and feature alignment narrows the feature distance to learn the domain-invariant features. For the second stage of self-training, the classification boundaries are further modified to fit the pseudo-labeled test data, which can also align the distribution of each category between the synthetic training data and the measured test data, thus enhancing the classification performance. The contributions of this paper can be summarized as follows:
- (1)
We design the two-stage gradual domain adaptation framework. Warm-up stage: training the feature alignment classification network to obtain the domain-invariant feature representation and a high-precision initialized classification result. Fine-tuning stage: selecting some pseudo-labeled data to fine-tune the classifier, and it actually narrows the distribution difference between the training data and test data for each category.
- (2)
We propose a pseudo-label denoising method to eliminate some falsely pseudo-labeled data, while falsely pseudo-labeled data negatively influence the fine-tuning stage. We use image similarity to measure the similarity of the pseudo-labeled data, and then keep the label consistent between the images of high similarities. The technique is simple and effective for improving the accuracy of the pseudo-labeled data, thus improving the final classification performance of the whole framework.
- (3)
We evaluate the whole framework on the newly published SAMPLE data set. The experimental results on the SAMPLE dataset illustrate that our proposed framework is practical and our proposed method can obtain state-of-the-art recognition accuracy.
The rest of our paper is arranged as follows: in
Section 2, we briefly review some related works; in
Section 3, we introduce the gradual domain adaptation recognition framework and the pseudo-label denoising method; we provide the experimental setups and results in
Section 4; finally, in
Section 5 and
Section 6, we give some discussions and draw some conclusions, respectively.
3. Gradual Domain Adaptation Recognition Framework
We propose a gradual domain adaptation recognition framework with pseudo-label denoising to solve the distribution gap problem and make the classifier trained by the training data more applicable to the test data. The flowchart of the whole algorithm is illustrated in
Figure 2 below. Synthetic dataset
is in the source domain and measured dataset
is in the target domain. The recognition task is training on the
and then testing on the
. Since there is a distribution difference between the synthetic data and the measured data, we use a feature alignment classification network
to learn the domain-invariant feature representation for classification. Then, some pseudo-labeled data
from
are selected by pre-set threshold and pseudo-labeled denoising method.
are used to fine-tune the
for a better result.
In this section, we first introduce a basic recognition network; then, we introduce the gradual domain adaptation recognition framework, which contains two domain adaptation stages: the feature alignment classification network and the self-training method; finally, we also present the proposed pseudo-label denoising method, which we carry out in the self-training process. The detailed introductions are given below.
3.1. Basic Recognition Network
A previous study [
31] demonstrated that too deep network structures might not generalize well when there is a domain shift between the training and test data. Thus, we think that the five-layer network structure is sufficient to extract the deep distinguishing features because A-ConvNets have achieved state-of-the-art recognition accuracy for the ten-class MSTAR recognition problem [
6]. As shown in
Figure 3, we refer to A-ConvNets when designing the basic recognition network and modify the kernel sizes to fit the image sizes.
Here, we introduce the architecture of the basic recognition network. The network has a five-layer all-convolutional architecture, and there is a dropout [
58,
59] operation after the fourth convolutional layer. The kernel size of the first four layers is 5 × 5, and the kernel size of the fifth layer is 4 × 4. We parameterize the 5-layer basic recognition network as
, and
represents the corresponding parameters.
For an input sample
with the real category label
of the one-hot form, the predicted label of the recognition network can be expressed as a 10-dimensional vector
. The specific expression is given as follows:
where
is the
ith-dimensional value of vector
, which indicates the probability of the sample
belonging to the
i-th category.
K is the number of categories, and
in our experiment.
The classification loss
, which is actually cross entropy loss [
60], is calculated on the labeled synthetic dataset
:
where
is the
ith-dimensional value of the real category vector
.
3.2. Stage One: Feature Alignment Classification Network
As we can see in
Figure 3, a recognition network
is usually trained on the labeled training dataset
. Thus, when there is a distribution gap between the training and test data, the trained network may not fit well with the test dataset
.
We design a feature alignment classification network
to learn a domain-invariant feature representation for a better recognition result.
Figure 4 gives the illustration of the feature alignment classification network, and an earlier version is introduced in Ref. [
36]. However, while the earlier version utilizes a multi-scale deep network as the feature extractor, we use the all-convolutional architecture as the feature extractor.
Figure 4 shows that the network parameters are shared for the labeled synthetic dataset
and unlabeled measured dataset
. The classification loss
is calculated on the labeled synthetic dataset, and the feature alignment restriction
is calculated between the fourth-layer features of
and
.
and
represent the corresponding samples in the labeled synthetic data set
and unlabeled measured data set
, respectively.
The deep features of the fourth layer are expressed as
and
. The feature alignment loss between the features of the synthetic and measured datasets can be simply expressed as follows [
36,
42]:
Thus, the total loss function of the feature alignment classification network can be expressed as follows:
where
determines the proportion of feature alignment loss in the total loss
.
As a warm-up, the feature alignment classification network is trained to optimize the total loss function to learn the domain-invariant feature representation. The optimization of classification loss ensures an effective recognition performance on the synthetic training dataset , while the optimization of the feature alignment loss can narrow the feature difference between the training and test data, thereby improving the classification result on the measured test dataset .
3.3. Stage Two: Self-Training
Self-training is a common semi-supervised learning method, and it can also be utilized for further domain adaptation to narrow the distribution difference between the training and test data for each category.
Figure 5 shows the self-training process.
A classifier is initially trained on the labeled dataset to obtain the prediction on the unlabeled dataset . Then, some pseudo-labeled measured data are selected, and in the self-training process, the measured data with the pseudo-labels and labeled synthetic data are used to train the classifier together. Since the classifier has seen the measured data with the pseudo-labels and labeled synthetic data, the boundaries of the trained classifier are modified to fit the measured data better, which can also align the feature of the same categories in synthetic and measured domains.
When the feature alignment classification network
has been well trained and given an unlabeled measured sample
, the predicted pseudo-label
can be obtained:
indicates the probability of the sample
belonging to the predicted category.
The threshold value
is set and some pseudo-labeled data are selected as the
to fine-tune the whole network. For a pseudo-labeled sample (
,
), if
, it is selected; otherwise, it is not selected. We use the predicted soft labels as the pseudo-labels for fine-tuning, which means that the classification loss
calculated on the pseudo-labeled dataset
can be expressed as follows:
where
is the newly predicted label for sample
after fine-tuning. The pseudo-labels are constantly updated.
In the fine-tuning process, the total loss function can be expressed as follows:
we can see that in the self-training process, the classification loss is calculated on the
and
.
The performance of the self-training method mainly depends on the initially trained classifier. We design the feature alignment classification network as the initially trained classifier for a better recognition result. However, the pseudo-labeled dataset selected by the threshold is not entirely accurate and reliable. The correctness of pseudo-labels is also related to the recognition performance of the test data. Since the recognition result of the test data is not perfectly correct, the pseudo-labels of the test data are also not perfectly accurate. In addition to the pre-set threshold , we consider a new criterion when selecting pseudo-labeled data for fine-tuning.
3.4. Pseudo-Label Denoising Based on Image Similarity
In the fine-tuning process, we consider combining the image information to select reliable pseudo-labels. Inspired by Ref. [
28], which combines the CNN feature and image similarity for classification, and Ref. [
55], which also considers pseudo-label denoising to keep the predicted labels consistent for the K nearest neighbors in the pixel-wise classification, in our study, we use the image similarity and k-Nearest Neighbor (KNN) [
60] algorithm to eliminate some pseudo-labeled data, and the process is shown in
Figure 6.
The distribution difference between the synthetic and measured data may be reflected in background distribution or different features, such as target shape, gray value, etc. Thus, different aspects of image features can be reflected in different similarity measures, and the measured samples with high similarities are more likely to have the same labels. According to this assumption, we design the pseudo-label denoising method to keep the label consistent between the measured test samples with high similarities.
Figure 6 (left) shows that there are some mispredicted samples (of red color) in the preliminary pseudo-labeled dataset
, and the samples in the same circle represent the relationship with relatively higher similarity by similarity measure.
Figure 6 (middle) shows that the predicted pseudo-labels are guaranteed to be consistent between the two samples with the highest similarity (i.e., the sample with its KNNs with k = 1). If consistent, keep the pseudo-labeled sample and else eliminate it. Thus, the mispredicted samples can be eliminated according to the comparisons of the similarities and their predicted pseudo-labels as shown in
Figure 6 (right). The pseudo-label denoising method is introduced as follows (see Algorithm 1).
Algorithm 1 Pseudo-label denoising method procedure. |
Input: The threshold value . Output: The pseudo-labeled dataset after denoising. Step 1: The preliminary pseudo-label dataset can be obtained according to the pre-set threshold . Initialize as ∅. Step 2: Calculate the similarity matrix S of every two samples in .
where n is the number of samples in the . Step 3: for 1 to n do 1) Make ; 2) The pseudo-label is represented as for the sample . For , the pseudo-label of the sample with the highest image similarity in the similarity matrix is found and represented as according to S; 3) Check whether the pseudo-labels and are consistent; if consistent then, add the tth sample into the denoised data set ; end if end for Step 4: The denoised pseudo-labeled dataset and its corresponding labels are obtained after checking all the samples in the .
|
There are five image similarities used in Ref. [
28]; however, in our paper, we only use three because the other two depend on the results of image segmentation. Thus, we use cosine similarity (COSS), normalized mutual information similarity (NMIS), and structural similarity (SSIM). We briefly introduce them as follows. More details about the similarities can be found in Ref. [
28].
and
represent the two images.
- (1)
COSS:
where
and
represent the pixel values of
and
at coordinate
, respectively.
- (2)
NMIS:
where
and
represent information and joint information entropy, respectively.
- (3)
SSIM:
where
or
,
or
, and
represent the mean, variance, and covariance of the two images, respectively.
,
, and
are the constants set according to Ref. [
28].
4. Experimental Results
4.1. Experimental Dataset
The SAMPLE dataset contains a total of 10 types of data, each of which contains pairs of measured and synthetic data. The original image size is 128 × 128, and the resolution is 0.3 m × 0.3 m. The depression angles of the synthetic and measured data are from 14 to 17. More details can be found in Ref. [
24]. The data types and numbers of the original SAMPLE dataset are introduced in
Table 1, and the synthetic samples are one-to-one matched with the measured samples. This experimental setting is called “Training Scenario I”.
Figure 7 shows one (measured, synthetic) pair from each type. We can clearly see that the synthetic images lack background clutter, which is also mentioned in the literature [
23,
24]. Additionally, the strong scattering centers are somewhat different for the synthetic and measured data.
We also consider the experimental setting in Ref. [
25] in our paper. The depression angles of the synthetic data range from 14 to 16 and the depression angle of the measured data is 17. Compared with “Training Scenario I”, there is a difference in depression angle for synthetic and measured data.
Table 2 shows the specific data types and numbers of samples for training and testing, and this experimental setting is called “Training Scenario II”.
The software and hardware environment of the experimental model training and testing is Intel(R) Xeon(R) CPU E5-2643 v4 @3.40 GHz, 128 GB memory, Nvidia GeForce GTX 1080ti with 11G RAM. The operating system is Ubuntu 16.04.1 and the development environment is Python 2.7.12 and TensorFlow 1.8.0.
Our work explores the problem of training fully on the synthetic data and testing on the measured data. Therefore, training data refers to synthetic data and test data refers to measured data. We explore the effectiveness of our framework under two training scenarios. In the following experiments:
- (1)
for the image preprocessing, we crop the data to the size of 64 × 64 to exclude the influence of the background, and then we normalize them by min-max scaling to the range [0, 255].
- (2)
for the network parameters, we use the grid search [
61] method to explore the parameters. We provide a list of candidate values for the key parameters, and their final values are determined through experiments. The final parameter combination has a batch size value of 256, a dropout value of 0.2, and a
value of 10. We set the total epoch number to 500. We use the Adam [
62] optimizer with a learning rate of
.
- (3)
as for the presented experimental results, we carry out each group of experiments by 20 times, and we also provide the lowest and highest recognition rates, represented by “Min” and “Max”, respectively. “Ave” provides the average recognition rates of 20 experiments with the standard deviation.
4.2. Recognition Results of Basic Recognition Network
We perform the experiments that utilize the basic recognition network introduced in
Section 3.1 for the two training scenarios without feature alignment and self-training.
Table 3 shows the experimental results. Furthermore, to check the convergence property of our designed basic recognition network structure under two training scenarios, the curves of the loss value versus the number of epochs are shown in
Figure 8, and the epoch number is set to 200 when training the basic recognition network.
When we carry out the experiments under Training Scenario II, the average recognition results are lower and display more apparent fluctuation. As mentioned before, the training and test data of Training Scenario I are paired, and the quantities of training data and test data under Training Scenario I are larger. Thus, it is not suitable to directly compare the recognition results of these two recognition scenarios. As shown in
Figure 8, the curve of
Figure 8a is smoother, and there is almost no oscillation compared to
Figure 8b, and the difference to obtain
Figure 8a,b is the training data, which also illustrates that Training Scenario II is more complicated. Nevertheless, the final convergence values of
Figure 8a,b are similar, and the experimental results are similar under both training scenarios. The following validation experiments are carried out under Training Scenario II since the experiments are faster with a smaller amount of training data.
Then, we observe the experimental results under different data augmentation methods, namely by adding Gaussian noise [
25], speckled noise [
7], and translational augmentation [
6,
7]. Adding noise may address the distribution gap problem because the synthetic data lack background clutter.
In Ref. [
25], to add Gaussian noise
to a training image
x, a standard deviation (std)
is set, and the augmented image can be expressed as
(mean = 0, std =
). In our paper, we set the
range as
, and thus the training data is augmented by 9 times. In Ref. [
7], to apply speckle noise to a training image
x, the truncated exponential distribution
s, which imposes that the maximum intensity of noise samples does not exceed a given parameter
a, is used, and the augmented image can be expressed as
.
is the image after applying a median filter on the
x. The
a range is set as
, and thus the training data is also augmented by 9 times. For translational augmentation, given a training image
x, the translated image
can be written as
, which illustrates that all pixels in
x are shifted left by
u pixels and shifted down by
v pixels.
has the same size as
x. To guarantee that the target does not exceed the image boundary, the absolute values of
u and
v are smaller than 20.
Figure 9 gives an illustration of the three data augmentation operations.
We also conduct the experiments on the basic recognition network introduced in
Section 3.1 under Training Scenario II, and we augment the training data using the three data augmentation methods.
Table 4 provides the recognition results.
Speckle and Gaussian noise augmentation are both helpful for improving recognition rates. The improvement in recognition performance after adding speckle noise is better than after adding Gaussian noise, with a higher average recognition rate and a minor standard deviation. The literature [
63] also mentions that adding speckle noise for augmentation is more effective than adding Gaussian noise. Moreover, we can see that translational augmentation does not help to improve recognition accuracy. The literature [
25] also considers rotation augmentation; however, the experimental results show that rotation augmentation is not helpful for improving recognition performance [
25]. This type of augmentation method, which includes translation, rotation, or their combination, may not be suitable in this scenario, and may even lead to an overfitting problem with unsatisfactory results.
Therefore, we consider using speckle noise augmentation in our following experiments, which provides better results.
4.3. Results of Gradual Domain Adaptation Recognition Framework
This subsection discusses the recognition results of our proposed gradual domain adaptation recognition framework. It is worth mentioning that pseudo-label denoising is not carried out in this subsection. Because the gradual domain adaptation recognition framework is a two-stage training process, we discuss the effectiveness of both the feature alignment and self-training, represented by “feature alignment” and “self-training”, respectively. “Baseline” gives the result of the basic recognition network introduced in
Section 3.1 with speckle noise augmentation of the training data.
Table 5 and
Table 6 show the experiment results under Training scenarios I and II, respectively.
Table 5 and
Table 6 show that the proposed gradual domain adaptation recognition framework is applicable under Training scenarios I and II settings. Furthermore, the recognition results show incremental improvements. We use the t-distributed stochastic neighbor embedding (t-SNE) [
64] method, which can visualize high-dimensional data by mapping them into low-dimensional space, to further illustrate the effectiveness of our framework and compare the feature distributions extracted by the gradual domain adaptation classification network. We map the original vectorized images and features into a 2D space, and
Figure 10 shows the visualization results.
The t-SNE visualization results also illustrate that our proposed gradual domain adaptation framework is effective.
Figure 10a shows that there is a distribution gap between the original measured and synthetic data. Most of the training data are already distinguishable after the feature alignment classification, as shown in
Figure 10b. However, some categories are not well separated, and the feature distribution of some categories is not compact enough. As shown in
Figure 10c, after feature alignment and self-training, the feature distribution of different categories is more compact than we expect. Still, we can see that some samples are in the clusters of other classes (a sample of the test-8 category is in the train-5 cluster, which is shown in the grey circle of
Figure 10c).
Thus, in the following subsection, pseudo-label denoising is considered in the self-training process for a better recognition result.
4.4. Effectiveness of Pseudo-Label Denoising Method
We performed pseudo-label denoising in the self-training process. In this subsection, we use three similarity measures, namely COSS, NMIS, and SSIM, for pseudo-label denoising according to Algorithm 1.
Table 7 and
Table 8 show the experimental results of the gradual domain adaptation recognition framework with pseudo-label denoising under Training scenarios I and II, respectively.
Using different similarity measures for pseudo-label denoising is helpful for improvements in recognition performance. SSIM performs better with higher average recognition rates in Training scenarios I and II. Here, we also present the confusion matrices of the pseudo-labeled dataset before and after denoising to illustrate the effectiveness of pseudo-label denoising, which are shown in
Figure 11.
We can see that in the pseudo-labeled dataset without denoising, there are some mispredicted samples obtained by the feature alignment classification network.
Figure 11b–d show that after the pseudo-label denoising, the correctness of the pseudo-labeled data has improved, and all the false pseudo-labeled data are eliminated when using the SSIM similarity for denoising. Mispredicted pseudo-labeled samples are the main factors that affect the further improvement of recognition rates, and label denoising can improve recognition performance. These results are also consistent with the experimental results in
Table 7 and
Table 8, which illustrate that SSIM performs better. Furthermore, in the following experiments, the final results are shown using pseudo-label denoising using SSIM.
4.5. Comparisons with Other Methods
In this subsection, we compare our framework with other existing methods using the SAMPLE dataset under the two experimental settings, given in
Table 9 and
Table 10, respectively. More experimental results of different CNN architectures can be found in [
25,
31], and we only present the best recognition results achieved in [
25,
31] as the comparison.
Currently, more studies are carried out under Training Scenario I, and the best recognition accuracy is achieved by neural architecture searching [
31]. However, neural architecture searching is computationally expensive and usually difficult to achieve in real-world systems. The recognition results of our framework under Training Scenario I reach a relatively high level of accuracy. In addition, under Training Scenario II, our framework exceeds the results presented in the literature [
25], which utilized an ensemble of five models. Other research works use the SAMPLE dataset for recognition with different experimental settings. For example, in Ref. [
29], except for the labeled synthetic data, one or two measured samples per class are also used for training, and their proposed augmentation technique can achieve 95.1% classification accuracy, and our experimental results still outperform it. Thus, our proposed method has achieved the state-of-the-art level.
We further illustrate the effectiveness of the whole framework through the confusion matrices of the test data in
Figure 12, which are obtained at different training stages under Training Scenario II.
Figure 12a,b gives the results of the basic recognition network with or without data augmentation operation.
Figure 12c gives the result of the feature alignment classification network with speckle noise augmentation. Moreover,
Figure 12d gives the final result of the whole framework with pseudo-label denoising.
Firstly, we can see that the recognition results are progressively better. Secondly, as seen in
Figure 12a–c, the M35, 2S1, and M548 categories are hard to recognize. Thirdly, comparing
Figure 12b,c, feature alignment may cause the features of M35 and M548 to become closer. Fourthly, comparing
Figure 12c,d, we can see that the denoised pseudo-labeled dataset improves the recognition results effectively. Therefore, all the methods of our framework work together to obtain the final results.
4.6. Ablation Analysis of Our Proposed Method
Because our overall framework combines many different kinds of methods, such as data augmentation, feature alignment between domains, self-training, and the pseudo-label denoising method, we consider the ablation experiments in this subsection to verify the effectiveness of these methods (
Table 11). We perform the ablation analysis under Training Scenario II.
By ablation analysis, we can observe that when any method of our designed framework is removed, the performance degrades, verifying that (1) these methods, which are speckle noise augmentation, feature alignment, and self-training are all effective for improving the performance; (2) the utilization of self-training usually leads to a higher standard deviation since the effectiveness of self-training methods mainly depends on the initially trained classifier and the size of the original training dataset, and the larger size of the original dataset makes the pseudo-labeled test data a smaller proportion in the fine-tuning process. Thus, we use the feature alignment classification network obtained by the speckle noise augmented training data, which can provide more stable classification performance; (3) pseudo-label denoising scheme can lead to a better result, with a higher and more stable recognition rate. We can see that the whole framework works together to obtain the final result.
5. Discussion
We also discuss some limitations of the proposed method here. For the problem of training fully on the synthetic data and testing on the measured data considered in this paper, we need to know the categories of synthetic and real data in advance. However, in the real-world recognition scenario, it may be difficult to know the categories of real data in advance. Considering the proposed framework, self-training is a crucial step in our proposed framework, and it relies on the initially trained classifier. We have tried different basic recognition network structures, based on which the proposed feature alignment and self-training with pseudo-label denoising strategies can also improve recognition performance, but the final experimental results are not so good. Therefore, we proposed a feature alignment classification network for a better-initialized recognition result and a pseudo-label denoising method to eliminate the falsely pseudo-labeled data. In addition, there is still room for improving the pseudo-label denoising method, and some pre-set parameters are manually designed, which can be optimized. As a whole, seen from the experimental results, the proposed gradual domain adaptation framework has indeed achieved incremental improvements.