Next Article in Journal
Impact of Rapid Urban Sprawl on the Local Meteorological Observational Environment Based on Remote Sensing Images and GIS Technology
Next Article in Special Issue
Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection
Previous Article in Journal
Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification
Previous Article in Special Issue
Fine-Grained Tidal Flat Waterbody Extraction Method (FYOLOv3) for High-Resolution Remote Sensing Images
 
 
Article
Peer-Review Record

ADT-Det: Adaptive Dynamic Refined Single-Stage Transformer Detector for Arbitrary-Oriented Object Detection in Satellite Optical Imagery

Remote Sens. 2021, 13(13), 2623; https://doi.org/10.3390/rs13132623
by Yongbin Zheng *,†, Peng Sun †, Zongtan Zhou, Wanying Xu and Qiang Ren
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2021, 13(13), 2623; https://doi.org/10.3390/rs13132623
Submission received: 19 May 2021 / Revised: 25 June 2021 / Accepted: 30 June 2021 / Published: 4 July 2021
(This article belongs to the Special Issue Advances in Object and Activity Detection in Remote Sensing Imagery)

Round 1

Reviewer 1 Report

The manuscript is interesting and well organized overall. Bellow the authors can find some improvement suggestions for their paper:

  1. In the abstract section the paper contributions and problem you are trying to solve should be better
    formulated.
  2.  At the end of the abstract, where you say 
    " Experiments on two challenging satellite imagery public datasets, DOTA and HRSC2016, demonstrate the proposed ADT-Det detector achieves the state-of-the-art detection accuracy while running very
    fast [...]" You should also provide some numbers to accompany these claims. If you quantify the results, your paper will be more technically sound and more professional.
  3.   Figure captions are shallow and they must be improved.
  4. At the end of the related work section the authors should say how their work builds upon the state of the art. What is the same and what is different.
  5. The authors should motivate why they have chosen a feature pyramid transformer, why other neural network architectures are not useful for the proposed type of application.
  6. Equations should be better explained and they should be referenced in the text.
  7. How are the weighting factor and modulating factor chosen.
  8. I think a table illustrating the speed of the proposed solution with respect to other solutions from the state of the art would be useful.
  9.  More details regarding the proposed solution should be given.
  10. In section 4.3.3 You should begin the phrase with a capital "P" not a small letter. There are multiple typos and phrases that need rephrasing in your paper in order to make it more clear. You should fix all such issues.
  11. In the related work you could mark the sections for which you are reviewing the state of the art (ex : Spatial Transformer Network, Refined Object Detectors, Feature Pyramid Networks etc. ) somehow different such that the reader knows a different subsection begins. You could use italic font and leave the subsection name on that row (and begin the actual review from the next row)
  12. Why did the authors use only neural networks in their solution? For example why feature engineering solutions are not useful for the proposed solution?
  13. You should add a few more state of the art solutions in the related work section.

Author Response

Thank you very much for reviewing our manuscript remotesensing-1247463. This will help us to improve it to a better scientific level. We have carefully studied your comments and made corrections, hoping to meet with approval.  Please see the attachment. 

Author Response File: Author Response.pdf

Reviewer 2 Report

In this paper adaptive dynamic refined single-stage transformer detector for rotated objects detection in satellite imagery is proposed.  Very interesting part is comparison with well-known object detection methods, but the paper has serious drawbacks and it could not be accepted for publication in its present form. The comment follows.

  1. Order of Figures have to be changed. For example Figure 2 and 10 are described first, and Figure 1 is described last.
  2. Figure description should be optimized.
  3. In first sentence of Abstract romote sensing is used instead of remote.
  4. In Keywords part satellite Imagery with capital letter I is used.
  5. What is Mr from equation (3)?
  6. What SoftMax function from equation (4)?
  7. Where is Y in equation (5)?
  8. What is L1 lost?
  9. What is y from equation (10)?
  10. Based on what is calculated accuracy of the proposed and other algorithms? How are these percentages obtained?

Author Response

Thank you very much for reviewing our manuscript remotesensing-1247463. This will help us to improve it to a better scientific level. We have carefully studied your comments and made corrections, hoping to meet with approval.

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors show ADT-Det, a RetinaNet-based neural network able to rotate the bounding box dynamically to improve objects recognition and classification.

The work is interesting but English must be totally revised. Furthermore, Table 4 misses the images input size for each network, worth to appreciate the FPS.

The description of the implementation details section misses information regarding the training techniques e.g., was the network trained in different steps?

Author Response

Thank you very much for reviewing our manuscript remotesensing-1247463. This will help us to improve it to a better scientific level. We have carefully studied your comments and made corrections, hoping to meet with approval.

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

In this manuscript, the Authors propose a deep learning approach for fast object identification within satellites images.

The paper is well written, although many "typos" (see some below) require strong attention. 

Introduction is fine, and illustrations are indeed nice and very explicative (this reviewer enjoyed much them).  Also, the way the loss function is built and explained is very academic (fine). From the many "deep-learning" papers seen, this is very attractive and well discussed.

Experimental setup is large and convincing.

No concerns.


Typos and suggestions:

- consider to add "optical images" somewher , because satellite imagery is more than optical data (SAR, PolSAR,....).

  • Figure 1 is cited in the text too late (line 406!),
  • line 96, " has become has received..", revise,
  • line 106, consider to put a subsection to avoid, "Multi-Stage Object Detectors.  The framework ..:",
  • line 201: "comcatenaed ", revise,
  • caption Figure 6, "comcatenaed",
  • equation (2): the meaning of the \cdot, Convolution?, it is recommended to use another sign,

 

Author Response

Thank you very much for reviewing our manuscript remotesensing-1247463. This will help us to improve it to a better scientific level. We have carefully studied your comments and made corrections, hoping to meet with approval.

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed the reviewer comments

Author Response

Thanks again for your comment. This helps us improve it to a better scientific level. We have carefully studied your comments and revised some typos, hoping to meet with approval. 

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have improved both the sound of the research and the English spelling of their work.

There is still something unclear to me in the following sentence:

"Our ADT-Det detector could achieve an 89.75% accuracy when evaluated under the PASCAL VOC2007 metrics and a 12 fps speed when the input image size was 800×800. Furthermore, we could achieve a 14.6fps speed when the input image size was 600×600."

Because in table 4 it is reported that the 14.6fps are reached w.r.t. an input of 800x800.

Regards.

Author Response

Thanks again for reviewing our manuscript. This helps us improve it to a better scientific level. We have carefully studied your comments and made corrections, hoping to meet with approval.

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop