Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing

Remote Sens. 2025, 17(3), 514; https://doi.org/10.3390/rs17030514

by Chenglong Li

, Jianwei Zhang^*, Bihan Huo

and Yingjian Xue

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2025, 17(3), 514; https://doi.org/10.3390/rs17030514

Submission received: 20 January 2025 / Revised: 28 January 2025 / Accepted: 30 January 2025 / Published: 1 February 2025

(This article belongs to the Special Issue Multi-platform and Multi-modal Remote Sensing Data Fusion with Advanced Deep Learning Techniques (Second Edition))

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

The revised manuscript has modified all of my comments satisfactorily and the current version manuscript is recommended for publication.

Author Response

Thank you very much for your valuable comments and for taking the time to review our manuscript. We greatly appreciate your positive feedback regarding the revisions we have made. We are pleased to know that the current version of the manuscript meets your expectations and is recommended for publication.

Reviewer 2 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

In the revised version, the author emphasizes the connection between the research method and remote sensing, and adds a remote sensing dataset DOTA for experimental analysis. In general, the revised draft has been greatly improved. But there are other concerns.

1. What is the difference between remote sensing image target detection and natural image target detection?

2. Due to the limitation of remote sensing image resolution and the influence of complex environment, can the natural image object detection method be directly applied to remote sensing image? Especially when the remote sensing image scene is large and the target is small, can the small target be accurately detected?

Author Response

Comments 1: What is the difference between remote sensing image target detection and natural image target detection?

Response 1: Thank you for your feedback. In the dataset introduction section of our paper, we specifically addressed the differences between remote sensing images and natural image datasets. In particular, we discussed the challenges faced in target detection in remote sensing images, such as small objects, high density, and complex backgrounds, which are considerably different compared to target detection tasks in natural images. We hope this explanation helps clarify the distinctions between remote sensing image target detection and natural image target detection for you. The relevant content can be found in Line 333-346.

Comments 2: Due to the limitation of remote sensing image resolution and the influence of complex environment, can the natural image object detection method be directly applied to remote sensing image? Especially when the remote sensing image scene is large and the target is small, can the small target be accurately detected?

Response 2: Thank you for your insightful comments on the application of natural image object detection methods to remote sensing imagery, especially in the context of large scenes and small targets. We concur that these challenges are pivotal and deserve detailed exploration. Our proposed method, featuring an improved decoding update strategy for small target boxes, offers enhanced adaptability for remote sensing images compared to conventional end-to-end detectors. Nonetheless, we recognize that when addressing high-resolution remote sensing images, integrating a general target detection framework with a divide-and-conquer strategy is essential. We have enriched our discussion. You can find these additions in Line 434-454.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript proposed a DHQ-DETR framework, which integrates the Distribution Focus Loss (DFL) to improve residual learning and incorporates a High-Quality Query Selection (HQQS) module to effectively balances the classification and regression tasks. Some comments are as follows.

1、Suggest adding a section outlining the paper's organization.

2、Please adjust the symbol size in Figure 4 for clearer display.

3、It is suggested to add more recent relevant comparison models in the experimentals.

Author Response

Comments 1: Suggest adding a section outlining the paper's organization.

Response 1:Thank you for your suggestion. We agree with your comment. To improve the clarity of the paper's structure, we have added a new section outlining the organization of the paper. This section can be found on Lines 92-98.

Comments 2: Please adjust the symbol size in Figure 4 for clearer display.

Response 2: Thank you for pointing this out. We agree that the symbol size in Figure 4 can be improved for better visibility. We have adjusted the symbol size, making them larger and clearer. This change can be seen in Figure 3, Page 6.

Comments 3: It is suggested to add more recent relevant comparison models in the experimentals.

Response 3: Thank you for your valuable suggestion. We have updated the experimental section by adding more recent comparison models to strengthen our results. These models have been integrated into the discussion of the experiments, and additional comparisons can be found on Table 2, Page 13 and Table 3, Page 14

Reviewer 2 Report

Comments and Suggestions for Authors

The article "DHQ-DETR: Learning Distributed and High-Quality Object Query for Dense Detection" provides a detailed exploration of the innovative contributions of the DHQ-DETR model in addressing the challenges of dense multi-scale object detection, with significant practical applications. However, there are some minor areas that require improvement:

1、Emphasis on Application Context: The manuscript addresses challenges in dense object detection, but it does not sufficiently highlight the importance of such tasks in specific contexts like remote sensing or UAV imagery within the abstract and introduction sections. It is recommended to include a discussion of these applications to better contextualize the work and underline its practical relevance.

2、Methodology Section Organization: The methodology section comprises multiple components (e.g., distribution modeling, the HQQS module, and assignment strategies), but the structure appears somewhat fragmented. It is suggested to introduce an overview diagram (such as a system architecture diagram) in Section 3.1 to help readers quickly grasp the relationships among the modules.

3、Clarity of Figure Descriptions: In Figure 4, the distribution decoding section on the right does not clearly illustrate its connection to distribution modeling. It is recommended to include a brief explanation in the figure caption about the functionality of this module and its relationship with the broader methodology.

4、Mathematical Notation Explanation: It is suggested to provide explanations of each variable used in the equations directly below the formulas. For instance, while Equation (3) is adequately described, Equations (1) and (2) lack similar explanations, which may hinder the reader's understanding.

5、Terminology Consistency: Inconsistent use of terminology, such as "distribution-based modeling" and "probabilistic distribution modeling," appears in different parts of the manuscript. It is advised to standardize the terminology throughout the paper to ensure clarity and coherence.

Author Response

Comments 1: Emphasis on Application Context: The manuscript addresses challenges in dense object detection, but it does not sufficiently highlight the importance of such tasks in specific contexts like remote sensing or UAV imagery within the abstract and introduction sections. It is recommended to include a discussion of these applications to better contextualize the work and underline its practical relevance.

Response 1: Thank you for your insightful comment. We understand the importance of framing our research within specific application contexts such as remote sensing and UAV imagery, particularly given the challenges inherent in dense object detection in these fields. To address this, we will enhance the abstract and introduction sections to explicitly discuss these applications. This change can be found on Lines 1-5 and Lines 19-26. We also replace the Crowdhuman dataset with the DOTA dataset, which is a larger dataset with high-resolution aerial images.

Comments 2: Methodology Section Organization: The methodology section comprises multiple components (e.g., distribution modeling, the HQQS module, and assignment strategies), but the structure appears somewhat fragmented. It is suggested to introduce an overview diagram (such as a system architecture diagram) in Section 3.1 to help readers quickly grasp the relationships among the modules.

Response 2: Thank you for your insightful suggestion. I understand the importance of a well-organized methodology section to enhance clarity and comprehension for readers. To address this, I will introduce an overview diagram, at the beginning of Section 3.1. This diagram will visually outline the relationships among the various components, including distribution modeling, the HQQS module, and assignment strategies. This section can be found on Lines 159-170.

Comments 3: Clarity of Figure Descriptions: In Figure 4, the distribution decoding section on the right does not clearly illustrate its connection to distribution modeling. It is recommended to include a brief explanation in the figure caption about the functionality of this module and its relationship with the broader methodology.

Response 3: Thank you for your insightful feedback on the clarity of Figure 4. I understand that graphics are crucial to effectively convey their intended message, especially in complex methods. To address your question, I will supplement the motivation and explanation of the distribution decoding section to clarify its function and connection. The relevant content can be found in Lines 190-198 , Lines214-219 and Lines 221-231.

Comments 4: Mathematical Notation Explanation: It is suggested to provide explanations of each variable used in the equations directly below the formulas. For instance, while Equation (3) is adequately described, Equations (1) and (2) lack similar explanations, which may hinder the reader's understanding.

Response 4: Thank you for your valuable feedback regarding the mathematical notation in the document. We acknowledge that Equations (1) and (2) were not accompanied by detailed explanations of the variables, which may affect the clarity and understanding of these sections.

In response, we will revise the document to include explicit descriptions of each variable directly below these equations. The relevant description can be found in Lines 177-180 and Lines 184-185.

Comments 5: Terminology Consistency: Inconsistent use of terminology, such as "distribution-based modeling" and "probabilistic distribution modeling," appears in different parts of the manuscript. It is advised to standardize the terminology throughout the paper to ensure clarity and coherence.

Response 5: Thank you for pointing out the inconsistency in the terminology used throughout the manuscript. Maintaining consistent terminology is crucial for ensuring clarity and coherence, and we apologize for any confusion this may have caused.To address this issue, we will conduct a thorough review of the manuscript to identify and standardize the terms in question. We will choose "distribution-based modeling" and apply it consistently throughout the paper. Your comments are instrumental in helping us improve the quality of our work. Thank you for your assistance in enhancing the clarity of our manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper proposes a novel object detection method. The proposed method was validated on public datasets to demonstrate its effectiveness. Detailed concerns are given below.

1. This paper mainly studies object detection in computer vision. However, I do not see any contribution to remote sensing, either in terms of methods or data related to remote sensing.

2. Although the text in Figure1 has a large font, the resolution of Figure 1 is not satisfactory.

3. Compared with other object detection method, what are the advantages of the proposed DHQ-DETR.

4. The motivation of the proposed method must be enhanced.

5. Algorithm 1 can be reduced to show the main steps. Some annotations can be added to Algorithm 1 to enhance the understanding.

6. We know that object detection is very hot in computer vision. The object detection algorithm is updated quickly The proposed method should be compared with the state-of-the-art method. Unfortunately, we did not see the authors compare the proposed method with the latest one, at least not the algorithm published in 2024. The latest two papers [22, 42] were published in 2023.

7. The authors should conduct comparative experiments on larger data sets.

8. How efficient is the algorithm that the reader would like to see discussed in terms of computational overhead or running time.

9. What are the limitations of the proposed method? In future research, how can we solve these deficiencies?

Author Response

Comments 1: This paper mainly studies object detection in computer vision. However, I do not see any contribution to remote sensing, either in terms of methods or data related to remote sensing.

Response 1:

Thank you for your feedback. While our main focus is on advancing dense multi-scale object detection methods, we appreciate your observation on the lack of specific contributions to remote sensing. Dense multi-scale object detection is one of the most important challenges in remote sensing imagery, and in the revised version we have emphasized the connection between our research subject and challenges in remote sensing. This is reflected in the Abstract, Introduction, and Dataset. You can find the relevant content on Lines 1-5, Lines 19-26 and Table 2. We also replace the Crowdhuman dataset with the DOTA dataset, which is a larger dataset with high-resolution aerial images.

Comments 2: Although the text in Figure1 has a large font, the resolution of Figure 1 is not satisfactory.

Response 2:

Thank you for bringing this issue to our attention. We apologize for the oversight of the resolution of Figure 1. Given that Figure 1 shows the comparison results of various methods on the COCO dataset (natural object detection), although it can verify the effectiveness of the proposed method, as your comment 1, we need to strengthen the contribution of this paper to the field of remote sensing. Therefore, after careful consideration, we did not update the high-resolution Figure 1, but directly removed it. After the revision is released, please check it and let us know whether it meets your expectations or it still needs to be modified.

Comments 3: Compared with other object detection method, what are the advantages of the proposed DHQ-DETR.

Response 3:

Thank you for your question regarding the advantages of DHQ-DETR compared to other object detection methods. The key advantages of our proposed DHQ-DETR include:

The decoding method that utilizes distribution modeling effectively handles dense, multi-scale objects by enhancing consistency, thereby improving the end-to-end detection method's applicability in the field of remote sensing.
The high-quality query selection module filters out redundant features from high-confidence objects, enabling the initialization of queries related to noisy or difficult-to-detect objects. This process aids the decoder in accurately matching and learning the offsets of the anchor boxes.
In the original DETR training process, the number of positive samples exactly matches the number of annotations in a training data batch. Consequently, DETR demands more data and training time compared to staged detection algorithms. Given the limited data in the current remote sensing dataset, which contains fewer images than the COCO dataset, incorporating an auxiliary detection head to bypass hungarian matching presents a viable solution. Furthermore, our proposed sample asignment method is more accurate than existing methods.

Your inquiry helps us highlight these strengths in our paper and future presentations. Once again, I would like to express my sincere gratitude to you

Comments 4: The motivation of the proposed method must be enhanced.

Response 4:

Thank you for your feedback. We understand the importance of a clear and compelling motivation for the proposed method. We will revise the manuscript to more clearly articulate these motivations and the practical challenges we aim to solve. This will include a stronger emphasis on the remote sensing impacts and benefits of our approach, ensuring that readers fully grasp the significance and necessity of our research. Your comment is invaluable, and we appreciate the opportunity to strengthen this aspect of our work. These improvements can be found on Lines 190-198, 221-231, 272-280 and 297-309.

Comments 5: Algorithm 1 can be reduced to show the main steps. Some annotations can be added to Algorithm 1 to enhance the understanding.

Response 5:

Thank you for your suggestion regarding Algorithm 1. We agree that clarity and conciseness are important for understanding complex procedures. To address your feedback, we will condense Algorithm 1 to highlight the main steps, ensuring that the core process is immediately clear to the reader. Additionally, we will add annotations to provide context and explanations for each step, which should enhance comprehension and make the algorithm more accessible. The relevant explanation is in Lines 317-323. We appreciate your input, which will help us improve the presentation of our work and facilitate better understanding for our audience.

Comments 6: We know that object detection is very hot in computer vision. The object detection algorithm is updated quickly The proposed method should be compared with the state-of-the-art method. Unfortunately, we did not see the authors compare the proposed method with the latest one, at least not the algorithm published in 2024. The latest two papers [22, 42] were published in 2023.

Response 6:

Thank you for highlighting the need to compare our proposed method with the very latest state-of-the-art algorithms. We understand that the field of object detection evolves rapidly, and it's crucial to benchmark new approaches against the most current advancements. We have added the results of several new methods in the article and marked the publication year in the table for reviewers and readers to compare and reference. These can be found in Table 2 and Table 3

Comments 7: The authors should conduct comparative experiments on larger data sets.

Response 7:

Thank you for your suggestion regarding the use of larger datasets for comparative experiments. We agree that evaluating our method on bigger and more diverse datasets would provide a more robust assessment of its effectiveness and generalizability. We conducted experiments on the COCO dataset. However, no remote sensing dataset is as extensive as COCO. In response to your feedback, we revised our paper by replacing the original CrowdHuman dataset with the DOTAv1.0 dataset. This change aligns better with the theme of remote sensing and provides a more comprehensive and convincing dataset. You can view the relevant content in Lines 359-365 of the paper

Comments 8: How efficient is the algorithm that the reader would like to see discussed in terms of computational overhead or running time.

Response 8:

Thank you for highlighting the need to discuss the computational efficiency of our algorithm. We understand that readers are interested in not only the accuracy and effectiveness of a method but also its practicality in terms of computational overhead and running time. Among the current open-source methods, it is challenging to standardize server systems and environmental configurations—such as Python versions, CUDA versions, GPU driver versions, and hardware configurations—used by each method. This variability complicates the actual calculation and comparison of inference times. Nevertheless, we can estimate the computational cost of each method by examining the number of parameters and floating-point operations(FLOPs). The relevant data are shown in Table 1. Notably, DHQ-DETR is derived from RT-DETR and is capable of achieving real-time detection on high-performance platforms.

Comments 9: What are the limitations of the proposed method? In future research, how can we solve these deficiencies?

Response 9:

Thank you for your thoughtful question regarding the limitations of our proposed method and potential future research directions. It's important to acknowledge the constraints of DHQ-DETR to provide a balanced view of our work. By investigating these limitations and potential solutions, future research can build on our foundation to develop even more robust and versatile object detection systems. Your feedback is crucial for guiding these future endeavors, and we appreciate your input on these aspects. We have included a new section titled “Discussion” following the presentation of experimental results which in Lines 416-448. In this section, we elaborate on the limitations and future developments of DHQ-DETR.

Article Menu

DHQ-DETR: Distributed and High-Quality Object Query for Enhanced Dense Detection in Remote Sensing

Further Information

Guidelines

MDPI Initiatives

Follow MDPI