Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Parking Pattern Guided Vehicle and Aircraft Detection in Aligned SAR-EO Aerial View Images

Remote Sens. 2025, 17(16), 2808; https://doi.org/10.3390/rs17162808

by Zhe Geng^*

, Shiyu Zhang, Yu Zhang, Chongqi Xu, Linyi Wu and Daiyin Zhu

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Nam-Kyun IM

Remote Sens. 2025, 17(16), 2808; https://doi.org/10.3390/rs17162808

Submission received: 26 June 2025 / Revised: 8 August 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Applications of SAR for Environment Observation Analysis)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

see attached file

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study leverages vehicle and aircraft parking patterns derived from aligned EO-SAR image pairs to enable context-aware automatic target recognition. The paper demonstrates substantial experimental content with relatively comprehensive validation results. However, the literature review of existing methods remains insufficient, and the presentation of the proposed methodology lacks clear logical structure. Significant revisions to the core algorithm description are required.

1） In the introduction section, the discussion of target detection methods based on EO-SAR image pairs is primarily covered in the second paragraph on page 3. However, this paragraph focuses more on introducing multimodal datasets rather than elaborating on multimodal processing methods. Consequently, readers may find it difficult to distinguish the differences and unique features between the proposed method and existing approaches. It is recommended that the authors expand the literature review on target detection methods utilizing multimodal EO-SAR image pairs to better highlight the comparative advantages and innovations of this work.

2） In the abstract and introduction, the authors did not mention the challenges of target detection under OOD (Out-of-Distribution) scenes. The sudden appearance of OOD in Section II feels abrupt.

Judging by its title, Section II should introduce the proposed method, yet it devotes significant space to reviewing existing algorithms for OOD scenarios, which seems misplaced. It is recommended that the authors add a dedicated section to elaborate on concepts relevant to their algorithm, as well as the theoretical foundations underlying their approach.

This would help readers better understand the motivation, challenges, and technical basis of the proposed method before diving into implementation details. A clear conceptual and theoretical background would also strengthen the logical flow of the paper.

3） The paper would benefit from including a complete network architecture diagram or detailed algorithm flowchart to better illustrate the implementation of the proposed method. Currently, Figures 2 and 4 alone do not sufficiently show how the core innovations - such as the CGRID-Key strategy, adaptive weighting mechanism between EO/SAR modalities, and class-incremental learning - are specifically implemented in the system. A more comprehensive visual representation of the overall pipeline, key components, and their interconnections would greatly help readers understand the technical details and reproducibility of this work.

Additionally, the implementation flow on page 7 mentions Model 0 and Model 1 without clear definition - what exactly do these models represent? The current algorithm description in Section II lacks sufficient detail for readers to fully understand the implementation process.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper proposes to improve the target classification performance in aerial view imagery by using SAR and EO image pairs and exploiting parking patterns. Specifically, a class-incremental learning method based on exemplar selection strategy is proposed to achieve continuous learning in open-world environment. Experimental results on the UNICORN dataset and a self-made dataset demonstrate that the proposed network outperforms some existing baseline models in terms of recognition performance. In general, the writing of this paper overemphasizes trivial details and lacks overall logic and readability. The method and experiment descriptions are not well-organized, which make this paper difficult to understand and follow. Also, the research idea and goal of this work are not focused and the technical information and experimental subjects conveyed in the paper are somewhat sophisticated. As a result of this, the main innovations and values of this work are not highlighted. Therefore, a significant improvement is needed to improve its quality and this paper may not be acceptable in its current form.

Major comments:

A flowchart should be added in the paper to clearly describe the overall process of the proposed method.
The authors should clearly define the specific problems they are trying to address.
Although the motivations of this work are clear, the technical details on how to realize these goals are still not clear.
Figs. 10, 15, 19, and 21 seem to focus on different tasks, which should be more concentrated.

Minor comments:

In section 2.1, line 153, "ODD" should be "OOD".
Please add detailed explanations for the right part of Figure 4.
In section 2.2, line 241, there is an extra ")" in step 4.
4. In line 383, the figure citation should be "Fig. 20".

Comments on the Quality of English Language

The logic and readability of this paper need to be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This study takes a step forward from traditional SAR-based target recognition methods by using aligned EO (electro-optical) and SAR image pairs to overcome some of the limitations of SAR alone.

By effectively leveraging contextual information, such as vehicle parking patterns and surrounding structures like jet bridges for aircraft, the method improves recognition accuracy.

To make the most of EO-SAR paired data, the authors employ two main techniques:

CATRM (Cross-Modality ATR Module)

CGRID-Key (Context-Guided Re-Identification Key-view selection)

The approach shows strong performance, especially in detecting vehicles and aircraft, and demonstrates the potential of EO-SAR fusion for multi-sensor recognition tasks.

However, a few limitations need to be addressed:

1) Dependence on well-aligned EO-SAR image pairs

While the method performs well, it relies heavily on accurately aligned EO-SAR image pairs.

In real-world applications, achieving precise spatial and temporal alignment can be challenging due to differences in sensor types, platforms, and acquisition timing.

This fundamental limitation is not clearly discussed in the paper. It would be helpful if the authors acknowledged this issue and discussed how misalignment might affect performance, as well as how it could be addressed in future work.

2) Lack of detail on observation time and target identification

The paper doesn’t clearly explain how long each region or target (like cars or aircraft) was observed.

In practice, vehicles and aircraft often appear briefly and then leave, so it’s important to ask:

Can this method actually identify a specific vehicle or aircraft, or does it simply detect that something was present at a location?

Also, if multiple targets come and go in the same area, it’s unclear whether the system can distinguish between them.

Clarifying these points—especially regarding temporal resolution and whether the system can tell targets apart—would help better define the method’s capabilities.

3) Unclear spatial scale of observation areas

The paper does not mention how large the EO-SAR image regions are.

So it’s hard to tell whether the proposed method can be used for wide-area surveillance, or if it’s limited to small, localized zones.

It would be beneficial if the authors provided quantitative information about the spatial size and resolution of the areas used in their experiments, to give readers a clearer sense of the method’s real-world applicability.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have comprehensively addressed all previous concerns, particularly by adding an overall algorithm framework diagram that significantly improves readers' understanding of the proposed methodology. The manuscript demonstrates substantial research effort and presents satisfactory experimental results. However, the authors' professional academic writing skills appear somewhat underdeveloped, which affects the clarity of presentation.

1、Section 3.1.1 presents the experimental results of parking lot recognition only in the form of quantitative data in Table 1. Is there a way to visually display the parking lot recognition results using images?The same consideration applies to subsequent classification results (e.g., in Table 4). For complex scenes containing multiple vehicle types, could the authors provide visualizations of the classification outcomes? This would significantly enhance the results' interpretability. While such visual demonstrations would be ideal, the authors may omit this request if visualization proves unfeasible.

2、How were the labels for multiple vehicle types in Figure 13 obtained? For example, in the '0 Sedans' image, does the entire region contain only sedans?

3、The statement on page 18, "During the review process, a reviewer suggested that we should supplement comparisons with other methods to highlight the advantages of the proposed method," is inappropriate. The main text should focus on explaining our methodology and implementation. Reviewer comments and corresponding responses should only appear in the revision documentation (e.g., response letter), not in the manuscript body. Please thoroughly review the main text.

4、What do the three panels in Figure 18 (from left to right) respectively represent? There appear to be no figure labels.

5、Given the differing input modalities (single-view/unlabeled vs. multi-view/labeled) between the airplane ATR (3A) and vehicle ATR (3B) in Fig. 2, are there corresponding architectural differences in their network designs ? In other words, do these two target types (aircraft vs. vehicles) share the same ATR network architecture, or are they processed differently? I find some confusion in distinguishing their detection methodologies and experimental results. Could the authors clarify this with specific architectural or operational comparisons?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

No further comment.

Author Response

Thank you.

Article Menu

Parking Pattern Guided Vehicle and Aircraft Detection in Aligned SAR-EO Aerial View Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI