Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s

Remote Sens. 2023, 15(9), 2429; https://doi.org/10.3390/rs15092429

by Pengfei Liu

, Qing Wang^*, Huan Zhang, Jing Mi and Youchen Liu

Reviewer 1:

Mohd Haq

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Ivan Matveev

Remote Sens. 2023, 15(9), 2429; https://doi.org/10.3390/rs15092429

Submission received: 4 April 2023 / Revised: 29 April 2023 / Accepted: 4 May 2023 / Published: 5 May 2023

(This article belongs to the Special Issue Machine Learning and Image Processing for Object Detection)

Round 1

Reviewer 1 Report

Dear Authors

The paper titled “A lightweight object detection algorithm in Remote Sensing images based on attention mechanism and YOLOV5s” proposed a lightweight object detection algorithm based on attention mechanism and YOLOv5s. Firstly, a DD-head module is built based on decoupled head and depthwise convolution to replace the coupled head of YOLOv5s for alleviating the negative impact of conflicts between classification and regression tasks. Then, an SPPCSPG module based on SPPCSPC module and GSConv module is constructed to replace the SPPF module of YOLOv5s, which improves the utilization of multi-scale information.

The paper is addressing important issues however it needs more improvements.

1. Extensive English editing is required throughout the manuscript.

2. Abstract is very long and confusing.

3. Why authors used RSOD and DIOR dataset, other datasets including OID, CIFAR’s can also be used?

4. Currently Yolov7 is in use, authors should justify that why v5 is used in the present investigation?

5. Introduction first paragraph: The starting of the section should have what makes AI so famous, which type of recent applications it has in various areas to justify its utilization in the present work. I suggest adding the including as deep learning-based modeling of groundwater storage change; cdlstm: a novel model for climate change forecasting; analysis of environmental factors using ai and ml methods; smotednn: novel model for air pollution forecasting and aqi classification

6. Authors can also add target object detection from unmanned aerial vehicle (uav) images based on improved yolo algorithm.

7. Attention needs some original reference, including Vaswani’s work, the work developed the Transformers.

8. Limitations and the future scope should be added with more clarity.

9. Authors need to provide the merits of this study vs. other review studies.

10. The inter-comparison or comparison with other studies is missing, please add them.

Minor editing is required.

Author Response

Reviewer #1:

The paper is addressing important issues however it needs more improvements.

Response:

Dear reviewer:

Thank you for your valuable suggestions and opinions on our work. These suggestions are very valuable for our research and will help improve the quality and reliability of the research. e have made revisions to the article, and with the help of your suggestions, this paper has made significant improvements compared to the original manuscript. Thank you again for your patient review and valuable suggestions.

Question: Extensive English editing is required throughout the manuscript.

Response: Thanks for your suggestions. We have submitted the MDPI for English polishing.

Question: Abstract is very long and confusing.

Response: Thanks for your suggestions. We have rewritten the abstract.

Question: Why authors used RSOD and DIOR dataset, other datasets including OID, CIFAR’s can also be used?

Response: Thanks for your suggestions. The model constructed in this article is aimed at target detection in remote sensing images, so two commonly remote sensing dataset, RSOD and DIOR, were selected. To demonstrate that the constructed model is not only suitable for object detection in remote sensing images, but also for object detection in conventional datasets, this study also selected VOC and COCO datasets to demonstrate that the model constructed in this paper still has improvement effects on conventional datasets. Therefore, the model constructed in this article is not limited to the four datasets selected in this study, but will also improve on other datasets, with only varying effects of improvement.

Question: Currently Yolov7 is in use, authors should justify that why v5 is used in the present investigation?

Response: Thanks for your suggestions. The core idea of this article is the construction idea of the SPPCSPG and DD-head modules, as well as the exploration of the combination of the SA module and the yolo series model. At the same time, considering the increasing pursuit of lightweight in object detection, this article uses the GSconv module to replace the traditional conv module. As well as the drawbacks of the traditional up-sampling process, the CARAFE module is introduced, and a remote sensing image-oriented object detection algorithm is constructed based on the YOLOv5s model. The main innovation of this article is how to construct a new module and explore the mechanism of embedding attention mechanism in the YOLO algorithm. Therefore, the ideas of this article can also be applied to the latest YOLO series algorithms, such as constructing object detection algorithms for drone images based on YOLOV7 or YOLOV8 algorithms.

Question: Introduction first paragraph: The starting of the section should have what makes AI so famous, which type of recent applications it has in various areas to justify its utilization in the present work. I suggest adding the including as deep learning-based modeling of groundwater storage change; cdlstm: a novel model for climate change forecasting; analysis of environmental factors using ai and ml methods; smotednn: novel model for air pollution forecasting and aqi classification

Response: Thanks for your suggestions. We have added a description of artificial intelligence in the introduction section and added the following literature.

[1] Haq, M.A.; Ahmed, A.; Khan, I.; Gyani, J.; Mohamed, A.; Attia, E.; Mangan, P.; Pandi, D. Analysis of environmen-tal factors using AI and ML methods. Sci Rep. 2022, 12, 13267.

[2] Haq, M.A.; Jilani, A.K.; Prabu, P. Deep Learning Based Modeling of Groundwater Storage Change. Cmc-Comput. Mat. Contin. 2022, 70, 4599-617.

[3] Haq, M.A. CDLSTM: A Novel Model for Climate Change Forecasting. Cmc-Comput. Mat. Contin. 2022, 71, 2363-81.

[4] Haq, M.A. SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification. Cmc-Comput. Mat. Contin. 2022, 71, 1403-25.

Question: Authors can also add target object detection from unmanned aerial vehicle (uav) images based on improved yolo algorithm.

Response: Thanks for your suggestions. The algorithm constructed in this article is mainly aimed at object detection in remote sensing images. To demonstrate the robustness of the algorithm constructed in this article, detection results on a universal object detection dataset are also added. In subsequent research, the author will construct algorithms for the unmanned aerial vehicle (uav) images.

Question: Attention needs some original reference, including Vaswani’s work, the work developed the Transformers.

Response: Thanks for your suggestions. We have added references to this section in the attention mechanism section.

Question: Limitations and the future scope should be added with more clarity.

Response: Thanks for your suggestions. We have added in the conclusion section the applicable scenarios and limitations of the model constructed in this article, as well as future research directions.

Question: Authors need to provide the merits of this study vs. other review studies.

Thanks for your suggestions. We conducted comparative experiments on the detection performance of the model constructed in this paper and existing advanced models on the DIOR dataset in Table 5, on the VOC dataset in Tables 6-7, and on the COCO dataset in Table 8. Through comparative experiments on three datasets, it can be seen that the model constructed in this paper performs better in detecting remote sensing datasets than current advanced detection algorithms. At the same time, the model constructed in this article has good robustness.

Question: The inter-comparison or comparison with other studies is missing, please add them.

Response: Thanks for your suggestions. We conducted comparative experiments on the detection performance of the model constructed in this paper and existing advanced models on the DIOR dataset in Table 5, on the VOC dataset in Tables 6-7, and on the COCO dataset in Table 8. Through comparative experiments on three datasets, it can be seen that the model constructed in this paper performs better in detecting remote sensing datasets than current advanced detection algorithms. At the same time, the model constructed in this article has good robustness.

Author Response File: Author Response.docx

Reviewer 2 Report

Authors Have presented their work with the title “A lightweight object detection algorithm in Remote Sensing images based on attention mechanism and YOLOV5s”

The research work contributed by the authors of this paper is mainly reflected in the following points:

· To address this problem, Authors proposed a lightweight object detection algorithm based on the YOLOv5s network, which combines the Shuffle Attention (SA) module, the DD-head module, the Content-Aware Reassembly of Features (CARAFE) module, the GSConv module, and the SPPCSPG module to improve the detection accuracy of the model while meeting real-time requirements.

· Depth wise convolution is used by the Authors to replace the standard convolution in the decoupled head module to construct a new detection head, the DD-head, which can improve the negative impact of classification and regression task conflicts while reducing the parameter volume of the decoupled head.

· Authors updated the SPPCSPC module and utilized the design principle of the GS bottleneck and replaces the CBS module in the SPPCSPC module with the GSConv module to design a lightweight SPPCSPG module, which was introduced into the backbone structure to optimize the YOLOv5s network model.

· The effect of embedding the SA module in the network backbone, neck, and head regions were studied, and the SA module is ultimately embedded in the head region to enhance spatial attention and channel attention of the feature map, thereby improving the accuracy of multi-scale object detection.

· Experimental results shown that the algorithm mentioned in the work outperforms the original YOLOv5s algorithm in multi-scale object detection performance while meeting real-time requirements.

· English language and style are fine/minor spell check required and check whether all the references have cited in the running text or not.

· I request the authors to avoid the words like I, WE, YOU in the paper running text and the paper should be written in third person format.

· I request the authors to add the following papers which are relevant to your work in references and cite them in the running text: https://doi.org/10.1007/s12524-020-01265-7, https://doi.org/10.1155/2022/1645658, https://doi.org/10.1007/978-981-16-3690-5_156, https://doi.org/10.1007/s11042-021-11649-7, and https://doi.org/10.1016/j.bspc.2022.104152.

Authors Have presented their work with the title “A lightweight object detection algorithm in Remote Sensing images based on attention mechanism and YOLOV5s”

The research work contributed by the authors of this paper is mainly reflected in the following points:

· Experimental results shown that the algorithm mentioned in the work outperforms the original YOLOv5s algorithm in multi-scale object detection performance while meeting real-time requirements.

· English language and style are fine/minor spell check required and check whether all the references have cited in the running text or not.

· I request the authors to avoid the words like I, WE, YOU in the paper running text and the paper should be written in third person format.

Author Response

Reviewer #2:

The research work contributed by the authors of this paper is mainly reflected in the following points:

To address this problem, Authors proposed a lightweight object detection algorithm based on the YOLOv5s network, which combines the Shuffle Attention (SA) module, the DD-head module, the Content-Aware Reassembly of Features (CARAFE) module, the GSConv module, and the SPPCSPG module to improve the detection accuracy of the model while meeting real-time requirements. Depth wise convolution is used by the Authors to replace the standard convolution in the decoupled head module to construct a new detection head, the DD-head, which can improve the negative impact of classification and regression task conflicts while reducing the parameter volume of the decoupled head. Authors updated the SPPCSPC module and utilized the design principle of the GS bottleneck and replaces the CBS module in the SPPCSPC module with the GSConv module to design a lightweight SPPCSPG module, which was introduced into the backbone structure to optimize the YOLOv5s network model. The effect of embedding the SA module in the network backbone, neck, and head regions were studied, and the SA module is ultimately embedded in the head region to enhance spatial attention and channel attention of the feature map, thereby improving the accuracy of multi-scale object detection. Experimental results shown that the algorithm mentioned in the work outperforms the original YOLOv5s algorithm in multi-scale object detection performance while meeting real-time requirements.

Response:

Dear reviewer:

Question: English language and style are fine/minor spell check required and check whether all the references have cited in the running text or not.

Response: Thanks for your suggestions. We have submitted the MDPI for English polishing. We have organized the references again, and all the cited references are listed in the References section at the end of the main text

Question: I request the authors to avoid the words like I, WE, YOU in the paper running text and the paper should be written in third person format.

Response: Thanks for your suggestions. We have rechecked the entire manuscript and made modifications to the inappropriate descriptions.

Question: I request the authors to add the following papers which are relevant to your work in references and cite them in the running text: https://doi.org/10.1007/s12524-020-01265-7, https://doi.org/10.1155/2022/1645658, https://doi.org/10.1007/978-981-16-3690-5_156, https://doi.org/10.1007/s11042-021-11649-7, and https://doi.org/10.1016/j.bspc.2022.104152.

Response: Thanks for your suggestions. We have added the following references to the paper.

[1] Merugu, S., Tiwari, A. & Sharma, S.K. Spatial–Spectral Image Classification with Edge Preserving Method. J Indian Soc Remote Sens 49, 703–711 (2021). https://doi.org/10.1007/s12524-020-01265-7.

[2] Abdul Subhani Shaik, Ram Kumar Karsh, Mohiul Islam, Surendra Pal Singh, "A Secure and Robust Autoencoder-Based Perceptual Image Hashing for Image Authentication", Wireless Communications and Mobile Computing, vol. 2022, Article ID 1645658, 17 pages, 2022. https://doi.org/10.1155/2022/1645658.

[3] Shaik, A.S., Karsh, R.K., Suresh, M., Gunjan, V.K. (2022). LWT-DCT Based Image Hashing for Tampering Localization via Blind Geometric Correction. In: Kumar, A., Senatore, S., Gunjan, V.K. (eds) ICDSMLA 2020. Lecture Notes in Electrical Engineering, vol 783. Springer, Singapore. https://doi.org/10.1007/978-981-16-3690-5_156.

[4] Shaik, A.S., Karsh, R.K., Islam, M. et al. A review of hashing based image authentication techniques. Multimed Tools Appl 81, 2489–2516 (2022). https://doi.org/10.1007/s11042-021-11649-7.

[5] Shaheen, H.; Ravikumar, K.; Lakshmipathi Anantha, N.; Uma Shankar Kumar, A.; Jayapandian, N.; Kirubakaran, S. An efficient classification of cirrhosis liver disease using hybrid convolutional neural network-capsule network. Biomed. Signal Process. Control. 2023, 80, 104152.

Author Response File: Author Response.docx

Reviewer 3 Report

Few general comments:

- To my understanding, the algorithm is not really “lightweight” as the authors claims in the title. Compared to YOLOv5, an improvement of roughly 2.5% in the accuracy is reached at the expense of +60% of detection time. Nonetheless, the algorithm seems to be able to fulfill the constraints for working in real-time in most of computer vision applications, and the results are promising.

- The abstract is a bit long, with a lot of acronyms that make the reading a bit heavy. I do not know if the abstract could be changed, so please take it as a suggestion for the next submission.

Other comments:

- Table VI: I spent a while to understand that the table is broken into two rows. Authors could highlight this in the table caption.

- Since the paper makes a great use of acronyms, I would avoid using them also in the figure/table captions, especially wherever the available space permits to use extended definition.

- It would be nice to have some pictures/examples representing the output of the proposed object detection method.

- The bibliography section is exhaustive. If possible, I would recommend to not use dot-surname, dot-name for long lists of authors. I mean, let us take reference 18: M., H.Y.; G., P.; J., M.; S., G.; M., S.; R., Z.; A., K.D.; R., M. Automated Breast Ultrasound Lesions Detection Using 822 Convolutional Neural Networks. Ieee J. Biomed. Health Inform. 2018, 22, 1218-26. I would prefer for a better reading: Moi, H. Y. et al. Automated Breast Ultrasound Lesions Detection Using Convolutional Neural Networks. Ieee J. Biomed. Health Inform. 2018, 22, 1218-26. The suggestion is meant for those papers with very long list of co-authors.

Author Response

Reviewer #3:

The paper describes a new object detection framework based on YOLOv5s and an attention mechanism. The overall impression is of a paper that sounds technically correct, with a wide introduction and a satisfying related works section. The methods are well described and easy to be understood. The experiments are adequate, together with a thorough comparison with the state-of-the-art. To my understanding, the algorithm is not really “lightweight” as the authors claims in the title. Compared to YOLOv5, an improvement of roughly 2.5% in the accuracy is reached at the expense of +60% of detection time. Nonetheless, the algorithm seems to be able to fulfill the constraints for working in real-time in most of computer vision applications, and the results are promising.

Response:

Dear reviewer:

Question: The abstract is a bit long, with a lot of acronyms that make the reading a bit heavy. I do not know if the abstract could be changed, so please take it as a suggestion for the next submission.

Response: Thanks for your suggestions. We have rewritten the abstract and clearly described all abbreviations in the abstract.

Question: Table VI: I spent a while to understand that the table is broken into two rows. Authors could highlight this in the table caption.

Response: Thanks for your suggestions. We have changed the name of the table to make it clearer for readers.

Question: Since the paper makes a great use of acronyms, I would avoid using them also in the figure/table captions, especially wherever the available space permits to use extended definition.

Response: Thanks for your suggestions. We change all abbreviations that appear in the figure/table captions to complete descriptions.

Question: It would be nice to have some pictures/examples representing the output of the proposed object detection method.

Response: Thanks for your suggestions. We demonstrate in Figures 10 and 11 the detection examples of our constructed model and the original model on RSOD and DIOR datasets, in order to demonstrate that the detection performance of the model constructed in this paper is superior to the original model on remote sensing images.

Question: The bibliography section is exhaustive. If possible, I would recommend to not use dot-surname, dot-name for long lists of authors. I mean, let us take reference 18: M., H.Y.; G., P.; J., M.; S., G.; M., S.; R., Z.; A., K.D.; R., M. Automated Breast Ultrasound Lesions Detection Using 822 Convolutional Neural Networks. Ieee J. Biomed. Health Inform. 2018, 22, 1218-26. I would prefer for a better reading: Moi, H. Y. et al. Automated Breast Ultrasound Lesions Detection Using Convolutional Neural Networks. Ieee J. Biomed. Health Inform. 2018, 22, 1218-26. The suggestion is meant for those papers with very long list of co-authors.

Response: Thanks for your suggestions. We have reorganized the references according to the literature format of the journal. Although there are more than three authors in the paper, we still list all the author information in the references according to the document format requirements of the journal.

Author Response File: Author Response.docx

Reviewer 4 Report

The paper describes a way of enriching of YOLO model with alternative/additional blocks in order to better fit a specific task. The description is enough clear, the text is easy to follow. Many numerical experiments are done. Generally this is a quality work.

There are the following minor issues.

1. Not all abbreviations are deciphered in the text, for instance VoVGSCSP is not. Also, it is not advisable to use abbreviations in an abstract. The abstract should present the main idea and main impact, but not the fine details.

2. Four databases are used (RSOD, DIOR, COCO and PASCAL). References are given, and the reader can follow and discover the properties of the DBs. However, I think, it is better to give some information (at least sample data size, number of samples, and number of classes) in one additional table, to ease comprehension.

3. Also, one would like to see number of parameters (and possibly number of layers) for all of neural networks in Tables 6 and following. Information on FPS and GPU is not easily comparable. More general way of comparing is number of parameters/layers.

4. Literature items starting from 18 has distorted author names.

After fixing these issues the paper can be published.

Author Response

Reviewer #4:

Response:

Dear reviewer:

Question: Not all abbreviations are deciphered in the text, for instance VoVGSCSP is not. Also, it is not advisable to use abbreviations in an abstract. The abstract should present the main idea and main impact, but not the fine details.

Response: Thanks for your suggestions. We have rewritten the abstract and clearly described all abbreviations in the text.

Question: Four databases are used (RSOD, DIOR, COCO and PASCAL). References are given, and the reader can follow and discover the properties of the DBs. However, I think, it is better to give some information (at least sample data size, number of samples, and number of classes) in one additional table, to ease comprehension.

Response: Thanks for your suggestions. We have added a description of the dataset used in this study in the experimental dataset section.

Question: Also, one would like to see number of parameters (and possibly number of layers) for all of neural networks in Tables 6 and following. Information on FPS and GPU is not easily comparable. More general way of comparing is number of parameters/layers.

Response: Thanks for your suggestions. We also realize that FPS cannot fully characterize the detection rate performance of the model, but after reviewing the literature, we found that some of the authors did not provide the parameter quantities of the model, and FPS can also reflect the detection performance of the model to a certain extent. So we still recommend retaining the FPS parameter.

Question: Literature items starting from 18 has distorted author names.

Response: Thanks for your suggestions. We have reorganized the references according to the literature format of the journal.

Article Menu

A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s

Further Information

Guidelines

MDPI Initiatives

Follow MDPI