Review Reports - Small Object Detection in Synthetic Aperture Radar with Modular Feature Encoding and Vectorized Box Regression

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors Dear authors, thank you for submitting of your article I must say that this is the third similar article on a similar topic in two months, which is strange. In your article, you write about small object detection in SAR imagery. This is challenging due to low resolution, small object sizes, and complex backgrounds, which standard detectors struggle to handle. DVDNet, a specialized CNN framework, addresses these issues using grouped-deformable convolutions, texture enhancement via LBP, and vector-based bounding box regression to improve detection of small, oriented objects. You write, taht experimental results show that DVDNet significantly outperforms standard detectors on multiple datasets, achieving high precision and recall, especially for small object subsets. Once again, as in previous articles, there is no section providing a better description of the data. This should certainly precede the description of neural network technology. In particular, why are you using free data files? It seems to be a trend: take a free data file, apply known neural network techniques, modify them, compare them with other techniques, create a table, and claim that yours is the best. Sorry, but there are too many articles like this. There is no reason for the search. And why are you combining SAR data with images of people from the VOC2012 set? This definitely needs to be explained. It completely confuses the reader. There is also no link to any grant; surely it is supported by something, and I would like to know if it is related to those previous similar articles. Take a look at your colleagues' articles to find out why Chinese authors have recently focused on searching for ships from SAR data. This is not a problem, of course, but it is necessary to define the use and reason for this research and what is new about it. It's probably some new trend. I have no idea what's behind it. Military use? State interest? Define the purpose and reason for conducting this research. I know of at least four similar articles on searching for small boats from SAR data, and these are from different workplaces. It would be logical to teach a neural network to search for ships if there is a reason to do so, and to try it on real data, not on some free data set. The data there is certainly only in excerpts and transformed, for example, into 8 bits. You don't comment on that at all.
In practice, it doesn't matter whether you search for ships on SAR data or other raster images. It's just that they're not as easy to see on SAR data, right? You're not exploiting the potential of multispectral or radar data, you're just searching for certain primitives on raster images? Please, I am not disparaging your results in any way, but please respond to my questions.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Targeting at small object detection task in SAR, this paper proposes Deformable Vectorized Detection Network (DVDNet) that leverages three key innovations in a single framework: Grouped-Deformable Convolution (GDConv) layers that provide adaptive receptive fields for multi-scale and deformable object shapes; An LBP Enhancement Module that injects rich local texture features into the CNN feature maps to help distinguish objects from complex backgrounds; a Vector Decomposition Module that predicts each object's bounding box using a pair of vectors, enabling an effective and continuous representation of oriented boxes without angular ambiguities. The structure is clear and plenitudinous experiments validate the high averge precision of DVDNet on small object detection in SAR.

The related work section is presented in a listing style, without a structured synthesis of technological trends, methodological paradigms and suggest to condense.
Comparison experiments are sufficient. If possible, provide the insight compared with the recent SAR object detection "STCADeNet: Spatial-temporal context awareness for video SAR shadow detection".
Since DVDNet is proposed for small object detection in SAR images, the images in Figure 1. framework is suggested to replace with SAR images. And what's the merits to provide optical datasets in the experimetns?
The sub-captions and sub-titles in Figure 3 and 4 are too big, not consist with the others.
figure8, table 3 and 4 should be reformat.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This paper investigates the challenges associated with small target detection in synthetic aperture radar (SAR) imagery and validates the proposed method using publicly available datasets. However, the method's originality is limited, as it represents an incremental improvement over existing approaches. The paper presents several issues that require further attention:

1. The paper specifies that the training epoch is set to 12. Considering that YOLO series models typically require more training iterations to fully capture data characteristics, the current setting may not adequately reflect the model's performance potential. It is recommended to conduct additional comparative experiments with YOLO-based models under extended training conditions to more comprehensively evaluate the performance advantages of DVDNet and mitigate potential biases caused by insufficient training.

2. The paper includes structural diagrams of key modules such as GDConv and the LBP enhancement module (e.g., Figures 3 and 4). However, these diagrams currently lack detailed internal information, such as input/output dimensions and core architectural designs, as well as their integration and data flow relationships within the overall network architecture. It is suggested to refine the diagram design by clearly labeling module inputs and outputs, their placement within the network, and the direction of data flow, thereby enhancing the clarity of module functions and interdependencies.

3. The paper provides an overview of datasets such as HRSID, SSDD, HRSC2016, and Pascal VOC 2012 but omits specific details regarding the partitioning of training, validation, and test sets, as well as the input image dimensions. Proper dataset partitioning is essential for ensuring experimental reproducibility, and image size significantly influences the performance of small target detection models. It is recommended to include these details to strengthen the credibility and replicability of the experimental results.

4. The paper emphasizes that DVDNet is specifically designed for "small target detection" and reports significant improvements in precision and recall for small targets. However, the experimental results do not provide separate performance metrics for targets of varying scales (e.g., small, medium, and large). It is recommended to include such detailed results to directly demonstrate DVDNet's superior performance in detecting small targets through comparative analysis.

5. The paper states that the computational complexity (FLOPs) of DVDNet is 40% lower than that of YOLOv5s but does not provide specific numerical values or comparisons with other methods in terms of computational cost and inference speed (e.g., FPS). It is recommended to supplement these engineering metrics to offer a more comprehensive evaluation of the model's suitability for deployment in resource-constrained environments and to enhance its practical relevance.

6. The paper specifies that the number of groups G in GDConv is set to 4 but does not justify this choice through experimental analysis or discuss the impact of varying G values on model performance (e.g., mAP) and computational efficiency (e.g., parameter count). It is suggested to include ablation studies that evaluate the relationship between group count and model effectiveness and efficiency, thereby providing stronger empirical support for the selected parameter configuration.

7. The paper presents visualizations of detection results from DVDNet across multiple datasets but does not compare these results with those of baseline methods such as Faster R-CNN or other competing approaches. It is recommended to supplement such comparative visualizations to more clearly illustrate DVDNet's advantages in terms of localization accuracy and robustness to background interference.

8. The paper indicates that GDConv is primarily applied in the conv3 and conv4 stages of ResNet-50, with the rationale that these layers are more effective for capturing multi-scale features. It is suggested to conduct comparative experiments by applying GDConv at other network depths (e.g., shallow conv2 and deeper conv5) and analyze the impact of placement on model performance (e.g., mAP) and computational cost, thereby further validating the design's effectiveness and methodological rigor.

9. The paper selects several classical detection algorithms as baselines, which provide a solid foundation for method validation. However, the comparison does not sufficiently cover algorithms specifically designed for "small target detection" or "SAR scene detection." It is recommended to include recent algorithms that focus on "small target detection in SAR scenes" to better highlight DVDNet's advantages in this specific application domain.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors, thank you for submitting of your improved article.

I see that you have changed, completed, and explained a lot in the article. I don't think I have any further comments. I still think that there are quite a few similar articles out there, and I don't see much value or science in them. You have a set of SAR data, you train on different neural networks, and then you find the best one. I don't know where the real science and progress in the field is. I personally wrote an article on the use of neural networks for satellite data, and the article was rejected on the grounds that it was not scientific. So what can I add to that? I'm not enthusiastic about similar articles, but I understand that it's modern.