Next Article in Journal
A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery
Next Article in Special Issue
Oriented SAR Ship Detection Based on Edge Deformable Convolution and Point Set Representation
Previous Article in Journal
AIRWAVE-SLSTR—An Algorithm to Estimate the Total Column of Water Vapour from SLSTR Measurements over Liquid Surfaces
Previous Article in Special Issue
Detection-Oriented Evaluation of SAR Dexterous Barrage Jamming Effectiveness
 
 
Article
Peer-Review Record

Angle-Controllable SAR Image Generation for Target Recognition with Few Samples

Remote Sens. 2025, 17(7), 1206; https://doi.org/10.3390/rs17071206
by Xilin Wang 1, Bingwei Hui 2,*, Wei Wang 2, Pengcheng Guo 1, Lei Ding 3 and Huangxing Lin 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2025, 17(7), 1206; https://doi.org/10.3390/rs17071206
Submission received: 28 February 2025 / Revised: 22 March 2025 / Accepted: 27 March 2025 / Published: 28 March 2025

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

The claim made by the authors in the response letter and in the revised work that the full MSTAR dataset contains "targets captured at angular intervals ranging from 1° to 2°. We employed its subsets with 1° azimuth sampling to benchmark angular resolution limits" is problematic. The number of SAR images for BMP2-9563, BTR60, BTR70 and T72-812 at the elevation angle of 15 degrees are only 195-196°. If the angular interval is 1°, Agarwal et al. would not have tried to synthesize the SAR image for \theta = 57 degrees with the SAR image for \theta = 56 degrees (see "T. Agarwal, N. Sugavanam, and E. Ertin, “Sparse Signal Models for Data Augmentation in Deep Learning ATR,” 2020 IEEE Radar Conference (RadarConf20), pp. 1–6, 2020."). To justify their claim, the authors have to provide a list of the MSTAR data files they used for the experiment (only the file names are required) in the next response letter, so I can check the angle interval. Note, the MSTAR data contains detailed information regarding the data collection geometry. An example is attached for the authors' information.

 

Comments for author File: Comments.pdf

Author Response

Reviewer 1

The claim made by the authors in the response letter and in the revised work that the full MSTAR dataset contains "targets captured at angular intervals ranging from 1° to 2°. We employed its subsets with 1° azimuth sampling to benchmark angular resolution limits" is problematic. The number of SAR images for BMP2-9563, BTR60, BTR70 and T72-812 at the elevation angle of 15 degrees are only 195-196°. If the angular interval is 1°, Agarwal et al. would not have tried to synthesize the SAR image for θ = 57 degrees with the SAR image for θ= 56 degrees (see "T. Agarwal, N. Sugavanam, and E. Ertin, “Sparse Signal Models for Data Augmentation in Deep Learning ATR,” 2020 IEEE Radar Conference (RadarConf20), pp. 1–6, 2020."). To justify their claim, the authors have to provide a list of the MSTAR data files they used for the experiment (only the file names are required) in the next response letter, so I can check the angle interval. Note, the MSTAR data contains detailed information regarding the data collection geometry. An example is attached for the authors' information.

AR: 

We have checked the experimental dataset and confirm that the angular intervals range from 1° to 2°, as reported in the paper. Below we attach the names of data files as requested:

The above are the revised angle-synthesized images of the MSTAR dataset. The images of 2S1, BRDM-2, D7, T62, ZIL13, ZSU23, and BTR60 all come from the dataset named MSTAR_PUBLIC_MIXED_TARGETS_CD2, where the target angle interval is 1°. The data source of T72 is MSTAR_PUBLIC_T_72_VARIANTS_CD2, where the target angle interval is also 1°

AR: 

We have also found descriptions of this dataset relevant to the angular intervals. See: (link/figure)

 

Author Response File: Author Response.docx

Reviewer 2 Report (Previous Reviewer 5)

Comments and Suggestions for Authors In this resubmitted paper, the authors have made the changes as I asked before. I still have some suggestions to improve the quality of this paper, which are as follows:      1) Language presentation needs to be improved. There are occasional grammatical issues and awkward phrasings.     2)  The angle synthesis algorithm is well-explained, but the section could benefit from a clearer explanation of how the sparse representation coefficients are derived and why this approach is effective for SAR images.     3) Why are cosine similarity and periodic angle loss chosen, and how do they complement each other?     4) The ablation study is thorough, but the results could be better contextualized. For example, what do the improvements in FID, SSIM, and SNR mean in practical terms for SAR image generation? Comments on the Quality of English Language

Language presentation needs to be improved.

Author Response

 

Reviewer 2

In this resubmitted paper, the authors have made the changes as I asked before. I still have some suggestions to improve the quality of this paper, which are as follows:

AR: We are thankful to your positive comments on the manuscript. In the following, we provide point-to-point responses to the raised remaining concerns. Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The authors addressed all my previous comments and now I recommend to publish the article in presented form.

Author Response

The authors addressed all my previous comments and now I recommend to publish the article in presented form.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report (Previous Reviewer 5)

Comments and Suggestions for Authors

I have no more questions.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Although the topic considered in this work is of great interest to many researchers in the field of SAR ATR, I am not convinced about the contribution of this work due to the various problems in presentation, equations, simulation results and references.

 

Presentation

In Line 346 on pp.8, it is written that “An in-depth analysis of this issue is presented in Chapter 4 with experimental evidence.” This is not a thesis/book, please don’t refer to the sections as chapters.

 

 

Equations

Since the equations are not numbered, I would only briefly mention the problems I noticed. The authors fail to define the symbols properly and the equations in Line 362-364 on pp. 9 are erroneous. In Line 286 on 7, the constraint is set as |\alpha|_0 < K, but in Line 295 on 7, it becomes |\alpha|_0 = K. I checked the corresponding reference provided by the authors i.e. [64], but only found a paper about the performance metric FSIM (“FSIM: a feature similarity index for image quality assessment”). Also, it is said that M is empirically set as 5, what about K? The value of K is not mentioned in the experiment section.

 

Simulation Results

1.        The claim made by the authors in Line 437 on pp.10 “The MSTAR dataset comprises 21,600 images, …, capturing targets at angular intervals of 1 degree” is not correct. The interval is greater than 1 degree and the average is about 3 degrees.

2.        In Sec. 4.1, it is said that MSTAR and CV-SARID are Dataset 1 and 2, respectively. For Dataset 2, each category comprises 71 images captured at pitch angles of 25° and 30° and 70 images captured at a pitch angle of 45°. However, from Fig. 7 (Section 4.3) on pp. 12, the authors started to refer the CV-SARID dataset as Dataset 1 and the MSTAR dataset as Dataset 2. By the end, the authors became so confused that they drew the conclusion that “This results in an improvement exceeding 6% and 3% in recognition accuracy evaluated on the CV-SARID dataset and MSTAR datasets, respectively” without presenting the results for MSTAR dataset.

3.        In Line 486, it is written that “Figure 5 presents the generative results on the two datasets”. I believe it should be Fig. 7.

4.        The authors should provide target class labels for the figures in Fig. 8.

5.        The caption for Fig. 9 is separated from the figure itself.

6.        Table 6 is mislabeled as Table 5.

7.        The captions for Fig.11 and Fig.12 are exactly the same, so do the captions for Fig. 13 and Fig. 14. Besides, based on the info provided on pp.16, Fig. 12 and Fig. 14 should present the results for the 45-degree pitch angle test data, right? But now the caption for Fig. 14 reads “(trained on images with 30-degree pitch angle and evaluated on the 25-degree ones)”, and the caption for Fig. 14 reads “(trained on images with 25-degree pitch angle and evaluated on the 30-degree ones)”.

8.        SAR images don’t reflect color information, yet in Section 4.1, the authors describe the ten types of targets in Dataset 2 as blue truck, red truck, white van, etc., and didn’t provide optical images for these targets. As a result, it is difficult to tell the size and the shape of the targets. The effectiveness of a new method can’t be evaluated solely based on the target recognition accuracy (which is a number) and the results have to be explainable. However, according to the confusion matrices provided by the authors in Fig. 12 and Fig. 14, before augmentation, Target #1 is often mistaken as #9. After augmentation, a large portion of Target #1 are mistaken as #0 and as #5 in Trial #3 and #4, respectively. If the target class labels are assigned based on their order in Fig. 6 (the relationship between the class label number and the class name is not provided), then Target #0 is a sedan, Target #1 is the Toyota off-road vehicle, Target #5 is the Land Cruiser off-road vehicle, and Target #9 is a gray van. It indicates that the performance of the proposed method is not stable.

9.        Dataset 2 in Table 1 is written as “Dataste2”.

 

Conclusion

Since the authors only provided the experiment results for the CV-SARID dataset in Section 4.5, how did the authors draw the conclusion that “This results in an improvement exceeding 6% and 3% in recognition accuracy evaluated on the CV-SARID dataset and MSTAR datasets, respectively”? Aren’t the results presented in Table 6, Figures 12 and 14 for the 45-degree pitch angle experiment correspond to the CV-SARID dataset (70 samples each category)?

 

References

1. In Line 138-141 on pp. 3, the authors cited [46]-[49] and provided brief summaries of these works. Unfortunately, the works the authors cited don’t match the description. For example, the FIGR corresponds to [44] instead of [46], while DAWSON corresponds to [45] rather than [47]. Similarly, neither [48] nor [49] matches the description the authors provided.

2. In Line 166 on pp. 4, it is written that “Zheng et al. introduce … [58]”. However, [58] is a paper authored by Jiang et al. and has nothing to do with “label smoothing regularization for generating type-ambiguous SAR target images”.

3. Acronyms like GAN, CNN, SAR and ATR in [3], [10], [11], [13], [15]-[21], [27], [50], [52], [54], [56], [57], [59] should be capitalized.

4. The authors adopt inconsistent citation styles for the works in the reference list. Please check the template provided by Remote Sensing and make sure that the citation style of each reference work meets the requirement.

Reviewer 2 Report

Comments and Suggestions for Authors

The evaluated article is application of SAR radar image generation for a precisely defined purpose. SAR radar applications have been known for a relatively long time, this article presents a new approach to data processing using the newly proposed framework for SAR image generation. I consider the article relevant to the topic focused on SAR. Authors document the results in series of key tables 3-5 and figures 11-14 using two datasets. which I consider to be an important contribution of the evaluated article.

The references describe the topic covered in the article. I consider their number as appropriate. I consider the conclusions resulting from the presented research to be correct, but the formal processing of the article is poor, see the comments, therefore I recommend the article to be considerably reworked.

Comments and questions:

 

1.     Please revise the sentence in line 79, there are repeating words (... achieving achieve ...)

2.     Numbering of figures starts with number 2.

3.     Is LFM Local Fusion Module or linear frequency modulator (in text explaining figure 2)? When the text says that encoder is E and decoder is H, please mark these functional blocks in the figure 2 alike. Explain abbreviation AZS in the figure 2.

4.     Which figure is figure 9? Is it the uncaptioned figure at the end of page 14?

5.     Where is table 2? There are two tables with number 5. Please renumber the tables.

6.     Figures 11 and 12 have the same caption, similarly figures 13 and 14.

Reviewer 3 Report

Comments and Suggestions for Authors

1. It is necessary to analyze in more detail how the final results are affected in terms of computational efficiency and complex module structure.
Using multiple modules (such as LFM, sparse representation, and dual discriminator) increases the training time and GPU memory usage.

2. It is necessary to design an efficient process flow or optimize the proposed algorithm.
Currently, the model requires about 17.4 GB of GPU memory, which is not suitable for real-time processing.
Lightweight design and hardware optimization are needed.

3. The experiments performed on two datasets (MSTAR and CV-SARID) are promising, but further explanation is needed on whether the model generalizes well to various environmental conditions or different types of targets.

4. It is well known that GAN-based models can suffer from issues related to learning stability and convergence.
Including additional analysis or visual representation (e.g., tracking the evolution of the loss function, presenting learning curves, or displaying intermediate output images) can help readers understand the learning dynamics and module interactions of the model more clearly.

Reviewer 4 Report

Comments and Suggestions for Authors

In this paper, an advanced angle-controllable synthetic aperture radar (SAR) image generation model is designed for the purpose of few sample recognition. The well-trained generative adversarial network (GAN) is qualified for preferably generating a sufficient number of SAR images within expected angle intervals. To validate the performance of the proposed model, the simulation is conducted and gets satisfactory results. Overall, the topic of this research has the novelty, and the manuscript is clearly introduced and well written. The paper can be accepted after a minor revision. 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

 English Language needs to be further improved.

Reviewer 5 Report

Comments and Suggestions for Authors

  This paper presents a novel approach for generating azimuth angle-controllable SAR images using a GAN-based framework, which introduces a Local Fusion Module (LFM), a controllable angle generation module, and an angle discrimination module, aiming at addressing the challenge of limited SAR target samples. The manuscript is generally well-written and the experimental results show that the proposed method enhances SAR target recognition accuracy. My comments are as follows:

1) Overfitting issue is often encountered for deep learning-based method, especially given the limited number of samples used for training in this study. So, a discussion on how the proposed method mitigates overfitting issue, or an analysis of the model's performance when using different number of training samples is necessary.

2) The process of how the angle synthesis algorithm is frozen during training and how it is activated during testing could be explained more clearly.

3) The role of the random coefficient vector in LFM could be elaborated more, especially how it affects the diversity of generated images.

4) The experiments are conducted on the datasets which are both vehicle targets. How about the performance of the proposed method if we use it to recognize other types of target?

5) More ablation experiments are needed. For example, an ablation study to analyze the contribution of the different loss functions.

6) How about the computational efficiency of the proposed method?

Back to TopTop