Next Article in Journal
Polar Cap Patches Scaling Properties: Insights from Swarm Data
Next Article in Special Issue
The Big Picture: An Improved Method for Mapping Shipping Activities
Previous Article in Journal
A Spatial–Temporal Block-Matching Patch-Tensor Model for Infrared Small Moving Target Detection in Complex Scenes
Previous Article in Special Issue
Marine Environmental Impact on CFAR Ship Detection as Measured by Wave Age in SAR Images
 
 
Article
Peer-Review Record

Optical Remote Sensing Ship Recognition and Classification Based on Improved YOLOv5

Remote Sens. 2023, 15(17), 4319; https://doi.org/10.3390/rs15174319
by Jun Jian 1,*, Long Liu 1, Yingxiang Zhang 1, Ke Xu 1 and Jiaxuan Yang 1,2
Reviewer 1:
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Reviewer 6: Anonymous
Remote Sens. 2023, 15(17), 4319; https://doi.org/10.3390/rs15174319
Submission received: 31 May 2023 / Revised: 19 August 2023 / Accepted: 25 August 2023 / Published: 1 September 2023
(This article belongs to the Special Issue Remote Sensing for Maritime Monitoring and Vessel Identification)

Round 1

Reviewer 1 Report

  • 1. The first letter of the parameter in Table 2 show be capitalized.

  • 2. Why are YOLO V6 and v8 experiments missing in Table 4
  • 3. Please specify the number of training and test sets
  • 4. The experimental part  only described the results , and the role of the improved part of improved-yolo  v5  is not discussed.
  •  
  •  
  •  

  •  

I suggest double-checking the english style and grammar.

Author Response

Question 1. The first letter of the parameter in Table 2 show be capitalized.

Response: We sincerely thank the reviewer for careful checking. We feel sorry our careless mistakes. As suggested by the reviewer, we have checked the capitalization of all letters in the tables and made consistent modifications. Please check the revised manuscript for details of the modification.

Question 2. Why are YOLO V6 and v8 experiments missing in Table 4

Response: We sincerely thank the reviewer for careful reading. The reasons are as follows: The YOLOv6 model is an algorithm specifically developed by a Chinese Internet company for its employees. It lacks good generalizability and is not suitable for theoretical research. YOLOv8, as the latest version of the YOLO series algorithm, has recently been released and has garnered significant attention from researchers. After submitting the manuscript, we promptly conducted relevant experiments and incorporated the experimental results into the revised manuscript. For specific details, please refer to the revised manuscript.

Question 3. Please specify the number of training and test sets.

Response: Thank you for the reviewer's reminder. We forgot to mention the dataset partitioning ratio in the manuscript. In practice, we set the ratio of training set to validation set as 8:2 before conducting the experiments. We have added this information in Section 5.1 Experimental Dataset of the revised manuscript.

Question 4. The experimental part only described the results, and the role of the improved part of improved-yolov5 is not discussed.

Response: Thank you very much for the reviewer's reminder. In the ablation experiments in section 5.5, we only discussed the effects of CBAM, BiFPN, and the lightweight GSConv structure, while neglecting the impact of the W-IoU loss function. This part has been added in the first paragraph in section 5.5.

Reviewer 2 Report

The overall paper gives moderate contribution to the field. Some improvements proposed by this reviewer are:

- Performance metrics of Yolov5 (original/not improved) should be included in Table 4, Figure 15 and Figure 16 to further support the authors' claim that their work does improve the performance of Yolov5.

- Figures 14 and 16 are rather difficult to understand, mainly because the objects are small and the colour coding a bit unclear. 

The use of English is OK, with some editing required. Some typos need to be corrected. Also, the presentation of variables in the text is inconsistent (some are typed in italics, some are not), as is the typesetting of the equations (see for example Equation 3).

Author Response

Question 1. Performance metrics of Yolov5 (original/not improved) should be included in Table 4, Figure 15 and Figure 16 to further support the authors' claim that their work does improve the performance of Yolov5.

Response: We think this is an excellent suggestion. We have added the experimental results and detection performance of YOLOv5 in Table 4 and Figure 15. This comprehensive comparison helps validate the performance of the Improved-YOLOv5 algorithm. Since the loss and mAP of YOLOv5 and Improved-YOLOv5 have been compared separately in Figure 12, the comparison is not performed in Figure 16.

Question 2. Figures 14 and 16 are rather difficult to understand, mainly because the objects are small and the colour coding a bit unclear.

Response: We sincerely thank the reviewer for careful reading. The main reason for the lack of clarity in Figures 14 and 16 is: Due to the large size of the original optical remote sensing images and the small size of the ship targets, we displayed enlarged based on the original image. This resulted in a lower resolution of the images. To improve the visual effect, we have made modifications to the thickness of the detection boxes and the size of the font. Additionally, we have used red boxes to indicate missed detections, blue boxes to indicate false detections, and purple boxes to indicate duplicate detections.

Question 3. The use of English is OK, with some editing required. Some typos need to be corrected.

Response: Thank you for your advice. We have tried our best to check and revise the whole language and correct the grammar. At the same time, we also invited a friend who is good at English and knows this field to revise the language of the paper. Finally, we checked and corrected the language of the whole paper. We sincerely hope that the revised manuscript will be accepted.

Question 4. Also, the presentation of variables in the text is inconsistent (some are typed in italics, some are not), as is the typesetting of the equations (see for example Equation 3).

Response: We sincerely thank the reviewer for careful checking. We feel sorry our careless mistakes. As suggested by the reviewer, we have checked and corrected all the variables in the text, and all the equations were reformatted. Please check the revised manuscript for details of the modification.

Reviewer 3 Report

Dear Author,

Thank you for submitting your work! We sincerely appreciate your interest in contributing to the field of remote sensing. Your paper presents ship classifications and recognition using YOLOv5. The topic aligns well with the focus of this journal. However, upon reviewing your submission, I have the following observations:

The implications of the research are not adequately discussed by the authors.

Furthermore, the motivation behind this paper is not clearly explained. To enhance the structure of your paper, I suggest separating the literature review into its own section and making the gap in the literature evident in the Introduction. This will help establish the motivation for your proposed work.

In addition, your results should explicitly state how they improve upon existing state-of-the-art techniques, including your own previous work if any, particularly in tabular form.

Please address these points in your revision to enhance the quality and impact of your paper. We look forward to reviewing the updated version of your manuscript.

We recommend that the authors thoroughly review the spelling and grammar throughout the entire manuscript. This will help ensure the accuracy and clarity of the content.

Author Response

Question 1. The implications of the research are not adequately discussed by the authors.

Response: We sincerely thank the reviewer for careful reading. It is possible that we did not express the significance of the study clearly. In the third paragraph of Section 1, we briefly introduced the research significance of ship target detection from both civil and military aspects, which we thought is in line with the theme of this special issue " Remote Sensing for Maritime Monitoring and Vessel Identification ".

Question 2. Furthermore, the motivation behind this paper is not clearly explained. To enhance the structure of your paper, I suggest separating the literature review into its own section and making the gap in the literature evident in the Introduction. This will help establish the motivation for your proposed work.

Response: We think this is an excellent suggestion, and according to your suggestion, we have separated the literature review into its own section and clearly pointed out the gaps in the literature and the motivation for writing. Please see Section 2 of the revised manuscript for details of the modification.

Question 3. In addition, your results should explicitly state how they improve upon existing state-of-the-art techniques, including your own previous work if any, particularly in tabular form.

Response: Our improvement work on the YOLOv5 model is mainly focused on Section 4. We also tested about 40 kinds of experiments in the early stage, and the training time of each experiment is about 18 hours. Some of the improvement methods include: Add different  attention mechanism, replace loss function, improve non-maximum suppression, replace activation function, improve upsampling, add small target layer, replace backbone network Swin Transformer, etc. In the paper, we only write the improvement methods that actually improve the detection results.

Question 4. We recommend that the authors thoroughly review the spelling and grammar throughout the entire manuscript. This will help ensure the accuracy and clarity of the content.

Response: Thank you for your advice. We have tried our best to check and revise the whole language and correct the grammar. At the same time, we also invited a friend who is good at English and knows this field to revise the language of the paper. Finally, we checked and corrected the language of the whole paper. We sincerely hope that the revised manuscript will be accepted.

Reviewer 4 Report

The authors propose the improved-YOLOv5 algorithm to provide optical remote sensing and ship classification.

The authors provide a overview of evolving trends in the field. However, there is a lack of a certain number of works that deal with: YOLO + satellite images + ship detection and even classification.

Related to the experiments themselves, the steps of the experiments are recounted in detail. However, in experiments it is not clear whether the data is divided into training-validation-test datasets or perhaps divided in some other way? If different datasets were not used for validation and testing, the question can be raised how the hyperparameters were determined? Or maybe the same dataset was used both for determining the hyperparameters and for the final testing? Or maybe the algorithm was tested on all the data used in the learning phase?

Author Response

Question 1. There is a lack of a certain number of works that deal with: YOLO + satellite images + ship detection and even classification.

Response: We sincerely thank the reviewer for careful reading. Regarding YOLO series algorithms and ship detection, we added the improved literature of YOLOv7 model to the literature review, and separately divided the literature review into sections to point out the gaps in the literature and the motivation for writing. As for remote sensing images, we simply analyzed and compared the three kinds of images in section 1.

Question 2. In experiments it is not clear whether the data is divided into training-validation-test datasets or perhaps divided in some other way? If different datasets were not used for validation and testing, the question can be raised how the hyperparameters were determined? Or maybe the same dataset was used both for determining the hyperparameters and for the final testing? Or maybe the algorithm was tested on all the data used in the learning phase?

Response: We sincerely thank the reviewer for careful checking. We feel sorry our careless mistakes. In practice, we set the ratio of training set to validation set as 8:2 before conducting the experiments. We have added this information in Section 5.1 Experimental Dataset of the revised manuscript.

Reviewer 5 Report

Good work in remote sensing ship classification and recognition. Here are some problems to be solved and to be reviewed again:

1.      In the introduction, the author needs to clearly explain what problem the proposed method is aimed at solving, what is the scope of application of the proposed method, and what targeted improvements are made compared to existing methods?

2.      Why choose YOLOv5 as the basic model for improvement? Please clarify the differences of YOLOv3 in shipdenet-20: an only 20 convolution layers and <1-mb lightweight sar ship detector, high-speed ship detection in sar images based on a grid convolutional neural network, hyperli-net: a hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery.

3.      The improvements proposed in the article are all based on the introduction of existing modules. What are the core innovation points of the paper?

4.      Why can CBAM enhance network performance? Please provide a more detailed description. In addition, the ablation experiment failed to demonstrate the effectiveness of CBAM in quad-fpn: a novel quad feature pyramid network for sar ship detection.

5.      Why does W-IOUv3 work better? W-IOUv3 is more suitable for this task compared to other losses such as C-IOU? The paper did not provide corresponding discussion in high-speed ship detection in sar images by improved yolov3.

6.      Figure 9 is too vague and lacks information injection of traditional hand-crafted features into modern cnn-based models for sar ship classification: what, why, where, and how.

7.      The experiments compared in Table 4 are too few and the methods are relatively old. The author should consider recent methods, especially those designed for remote sensing object detection.

8.      The authors should consider and think about it and add them in this work ending these and it is interesting depthwise separable convolution neural network for high-speed sar ship detection, a mask attention interaction and scale enhancement network for sar ship instance segmentation,

9.      Revise them a full-level context squeeze-and-excitation roi extractor for sar ship instance segmentation, htc+ for sar ship instance segmentation, a polarization fusion network with geometric feature embedding for sar ship classification, balance learning for ship detection from synthetic aperture radar remote sensing imagery, high-speed ship detection in sar images based on a grid convolutional neural network, and so on.

10.   IMHO, the Conclusion should be re-written to 1) explicitly describe the essential features/advantages of the review that other reviews do not have, and 2) describe the limitation(s) of the review.

11.   The English should be improved greatly.

Minor editing of English language required

Author Response

Question 1. In the introduction, the author needs to clearly explain what problem the proposed method is aimed at solving, what is the scope of application of the proposed method, and what targeted improvements are made compared to existing methods?

Response: We sincerely thank the reviewer for their careful reading. In the present Section 2, the main problem to be solved is that the unsatisfactory detection of optical remote sensing images are multi-scale, mostly small targets and closely arranged due to the interference of remote sensing satellite shooting distance and imaging angle. We have improved the YOLOv5 model, which is mainly used for the detection of small and closely spaced targets in remote sensing images. The main improvements in this paper have been briefly described in Section 2 and detailed in Section 4.

Question 2. Why choose YOLOv5 as the basic model for improvement? Please clarify the differences of YOLOv3 in shipdenet-20: an only 20 convolution layers and <1-mb lightweight sar ship detector, high-speed ship detection in sar images based on a grid convolutional neural network, hyperli-net: a hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery.

Response: Thank you very much for your questions. YOLOv5 is a model with a simple network structure and fast inference. It can meet both high detection accuracy and real-time detection, effectively balancing the requirements of both accuracy and speed. So we chosed YOLOv5s as the baseline model for improvement. As for the two models you mentioned, we have not read and studied them carefully before, so may not have understood them very well. “ShipDeNet-20” with 20 convolution layers and <1 MB (0.82 MB) model size. it used fewer layers and kernels, and depthwise separable convolution (DS-Conv) to ensure ShipDeNet-20's lightweight attribute.“hyperli-net”with fewer parameters, lower computation costs and lighter model. It achieved high-accuracy by using five external modules and five internal mechanisms for high-speed. Both models are lightweight models, which better achieve the balance of high-accuracy and fast-speed, and are worthy of our team's serious learning.

Question 3. The improvements proposed in the article are all based on the introduction of existing modules. What are the core innovation points of the paper?

Response: We strongly agree with the reviewer that the improvements to YOLOv5 are all based on the introduction of existing modules. We also tested about 40 kinds of experiments in the early stage, and the training time of each experiment is about 18 hours. Some of the improvement methods include: Add different attention mechanism, replace loss function, improve non-maximum suppression, replace activation function, improve upsampling, add small target layer, replace backbone network Swin Transformer, etc. We have found that not all improvements lead to improved performance. We believe that the main innovation lies in how existing modules are selected and applied to achieve good detection results. In addition, we proposed a median + bilateral filter method for noise reduction to reduce the interference of water ripples and waves and to highlight the ship feature information.

Question 4. Why can CBAM enhance network performance? Please provide a more detailed description. In addition, the ablation experiment failed to demonstrate the effectiveness of CBAM in quad-fpn: a novel quad feature pyramid network for sar ship detection.

Response: Thank you very much for your questions. Adding CBAM attention module to the backbone network helps the backbone network pay more attention to the region of interest and suppress useless information, thus strengthening the feature extraction ability of small targets. Different datasets and addition locations will bring different experimental results. Previously, we added different attention modules, such as CA, ECA, GAM, SE, etc., which did not bring performance improvement on our dataset.

Question 5. Why does W-IOUv3 work better? W-IOUv3 is more suitable for this task compared to other losses such as C-IOU? The paper did not provide corresponding discussion in high-speed ship detection in sar images by improved yolov3.

Response: Thank you very much for your questions. In the early stage of dataset production and labeling, we found that some images were blurred, resulting in unclear ship features, and there were cases of missed detection and false detection in the later stage of detection. Based on the above situation, we replaced the loss function W-IoUv3 to effectively deal with the difference in image quality. It can effectively reduce the competitiveness of high-quality anchor boxes and mask the harmful gradient of low-quality examples.

Question 6. Figure 9 is too vague and lacks information injection of traditional hand-crafted features into modern cnn-based models for sar ship classification: what, why, where, and how.

Response: We sincerely thank the reviewer for careful reading. The main reason for the lack of clarity in Figure 9 is: We have reduced the size of the original image. This resulted in a lower resolution of the images. To improve the visual effect, we placed the original image and rearranged it.

Question 7. The experiments compared in Table 4 are too few and the methods are relatively old. The author should consider recent methods, especially those designed for remote sensing object detection.

Response: We think this is an excellent suggestion, and according to your suggestion, we have added experiments with YOLOv8, the latest version of the YOLO series released this year. YOLOv8 compared to Improved-YOLOv5 in the detection effect is not very satisfactory, the specific detection effect is shown in Figure 16. We will fully consider your suggestions, and will subsequently carry out theoretical studies and experimental design of methods with remote sensing object detection, and actively contact other authors to exchange and share the source code of target detection algorithms, so as to further verify the advancement of the algorithms in this paper.

Question 8. The authors should consider and think about it and add them in this work ending these and it is interesting depthwise separable convolution neural network for high-speed sar ship detection, a mask attention interaction and scale enhancement network for sar ship instance segmentation.

Response: We think this is a good suggestion, and we will give full consideration to your suggestions. As you mentioned these methods and knowledge will be the direction of our subsequent research. Thank you very much for sharing your experience.

Question 9. Revise them a full-level context squeeze-and-excitation roi extractor for sar ship instance segmentation, htc+ for sar ship instance segmentation, a polarization fusion network with geometric feature embedding for sar ship classification, balance learning for ship detection from synthetic aperture radar remote sensing imagery, high-speed ship detection in sar images based on a grid convolutional neural network, and so on.

Response: Thank you reviewer for sharing your valuable experience. Subsequently, we will understand and learn these aspects based on your sharing, and also give due consideration to detecting ship targets in SAR images. Thank you very much for your recognition and comments on our work, and we will further revise and improve it based on your suggestions.

Question 10. IMHO, the Conclusion should be re-written to 1) explicitly describe the essential features/advantages of the review that other reviews do not have, and 2) describe the limitation(s) of the review.

Response: Thank you very much for your careful reading. According to your suggestions, we have rewritten the conclusion of Section 7. For specific details, please refer to the revised manuscript.

Question 11. The English should be improved greatly.

Response: Thank you for your advice. We have tried our best to check and revise the whole language and correct the grammar. At the same time, we also invited a friend who is good at English and knows this field to revise the language of the paper. Finally, we checked and corrected the language of the whole paper. We sincerely hope that the revised manuscript will be accepted.

Reviewer 6 Report

This manuscript proposed an improved Yolov5 method, aiming at solving the problem of small target detection and insufficient feature extraction. Throughout the whole text, the proposed detector is composed of several modules, which have developed by other researches. As far as I am concerned, the innovation of this work is limited and the experiments is insufficient. The details of this manuscript are suggested to be improved.

1. In Experimental Results and Analysis section, the authors only consider one dataset to verify the effect of the proposed detector. It is insufficient to prove the performance of the proposed detector.

2. In terms of the ablation experiment, the proposed detector fails to perform well for some relatively large ships when all components designed in this manuscript are introduced into the baseline, which could not apply to practical scenarios with ships of different scales.

3. As for the comparative detectors, the authors simply consider several classical methods and one state-of-the-art method. It is suggested that the authors add some recent research works for comparison.

4. A Discussion Section should be added to discuss the advantages and shortcomings of the proposed detector.

5. In Introduction Section, taking reference [13] as an example, the “Zhang P. P. et al.” should be replaced with “Zhang et al.”. The authors should check full text of the manuscript.

6. Some Figures are unclear, such as Figure 9, Figure 12, Figure 13 and Figure 15.

7. The format of the references are not standardized.

 Minor editing of English language required.

Author Response

Question 1. In Experimental Results and Analysis section, the authors only consider one dataset to verify the effect of the proposed detector. It is insufficient to prove the performance of the proposed detector.

Response: Thank you very much for your careful reading. The theme of our contribution to the special issue is " Remote Sensing for Maritime Monitoring and Vessel Identification ", so we used part of the ship dataset from the FAIRIM dataset and manually annotated it to classify it into 10 categories of common ships. We strongly agreed with the reviewer's opinion, and the next focus of work will be to find another suitable dataset to further verify the detection performance of our model.

Question 2. In terms of the ablation experiment, the proposed detector fails to perform well for some relatively large ships when all components designed in this manuscript are introduced into the baseline, which could not apply to practical scenarios with ships of different scales.

Response: Thank you very much for your questions. We cannot guarantee that the AP of all types of vessels will be improved. The end result is a relatively satisfactory one. AP has some improvement in detecting small targets and closely spaced targets. At the same time, the detection effect is significantly improved compared with the original model, as shown in Figure 14.

Question 3. As for the comparative detectors, the authors simply consider several classical methods and one state-of-the-art method. It is suggested that the authors add some recent research works for comparison.

Response: We think this is an excellent suggestion, and according to your suggestion, we have added experiments with YOLOv8, the latest version of the YOLO series released this year. YOLOv8 compared to Improved-YOLOv5 in the detection effect is not very satisfactory, the specific detection effect is shown in Figure 16. We will fully consider your suggestions, and try a comparison with some other algorithms.

Question 4. A Discussion Section should be added to discuss the advantages and shortcomings of the proposed detector.

Response: Thank you very much for your careful reading. According to your suggestions, we have rewritten the conclusion of Section 7. For specific details, please refer to the revised manuscript.

Question 5. In Introduction Section, taking reference [13] as an example, the “Zhang P. P. et al.” should be replaced with “Zhang et al.”. The authors should check full text of the manuscript.

Response: We sincerely thank the reviewer for careful checking. We feel sorry our careless mistakes. According to your suggestions, we have checked the full text and made changes.

Question 6. Some Figures are unclear, such as Figure 9, Figure 12, Figure 13 and Figure 15.

Response: Thank you very much for your careful reading. The main reason why Figures 9, 12, 13 and 15 do not look very clear is that we have reduced or enlarged the size of the original image, resulting in a low resolution of the images. In order to improve the visual effect, we placed the original images or resized the images, and finally reformatted it.

Question 7. The format of the references are not standardized.

Response: We sincerely thank the reviewer for careful checking. We will follow the reference template as well as the reference format of other papers for thorough checking and revision.

Round 2

Reviewer 1 Report

 
  • 1.In ablation experiment,why choose two modules each time?

  • Why not discuss each module separately?

  • 2. The author should discuss the reason why using these module(CBAM, BiFPN Network,GSConv Structure)  can obtain more  accurate result.
  •  
  •  

no suggestion

Author Response

Question1.In ablation experimentwhy choose two modules each timeWhy not discuss each module separately?

Response:We sincerely thank the reviewer for their careful questions.In the early ablation experiment, we also conducted separate experiments on CBAM, BiFPN+GSConv and W-IoU modules, but finally found that the experimental results of each module did not improve compared with the original YOLOv5s, and significantly decreased.Therefore, we only show the experiments that have improved compared with the original YOLOv5s in Table 3.Detailed results of individual experiments for each module are shown in the table below.

Improvement Strategy

AP%

mAP_0.5

%

FPS

f/s

Large-size

Medium-size

Small-size

CBAM

BiFPN+GSConv

WIoU

CS

DCS

LCS

PS

WS

ES

SC

FB

TB

MB

 

 

 

88.4

93.4

88.7

73

97.6

47.4

56.5

76.7

59

68

74.9

69

 

 

89.2

91.8

75.9

67

36.6

22.3

35.8

9.3

38.5

63.9

53

51

 

 

88.2

92.4

78.8

82.9

34.2

22.3

41.9

19.9

42.4

65.9

56.9

46

 

 

87

93.7

76.6

78.2

33.6

26.5

30.5

24.6

53.9

69.8

57.4

62

89.9

93.9

87.9

82.9

96

59.3

61.3

82.7

58.8

68.6

78.1

75

Question2. The author should discuss the reason why using these module (CBAM, BiFPN Network, GSConv Structure) can obtain more accurate result.

Response:We sincerely thank the reviewer for careful reading.According to your first modification comments, we have added the functions of these modules to the Section 5.5 Ablation Experiment and provided a brief description.We have highlighted this section in yellow so that you can review it more intuitively.

Reviewer 4 Report

The authors have improved the manuscript following the first round of reports.

However, only one reference has been added, so there is still room to more clearly position this work in the context of YOLO + satellite images + ship detection research.

The division into training and test datasets has been clarified, however - since the validation dataset is not mentioned - the question remains as to how the hyperparameters are determined (which btw are explicitly stated).

Author Response

Question1. However, only one reference has been added, so there is still room to more clearly position this work in the context of YOLO + satellite images + ship detection research.

Response:Thank you very much for the reviewer's valuable opinions.According to your first modification comments,we have carefully read and studied a literature on improving YOLOv7 ship target detection based on SAR images just published this year, and quoted it in this manuscript.We also divided the literature review into two sections:target detection algorithm and ship target detection algorithm, and sorted out the improvement work of each classical target detection algorithm, and pointed out the common shortcomings among them.Based on the above situation, we use the color information of optical remote sensing image to identify and classify the ship target.In addition, it takes a certain amount of time and energy to read a complete literature carefully.Subsequently, we will continue to look for literature on this subject to read and study in detail.If the reviewer has relevant literature materials, you can also share them with us, so that we can better learn and make progress.

Question2.The division into training and test datasets has been clarified, however - since the validation dataset is not mentioned - the question remains as to how the hyperparameters are determined (which btw are explicitly stated).

Response:We sincerely thank the reviewer for careful checking.According to your first modification suggestion, we have clarified the ratio of training set: validation set to be 8:2.Due to the small number of samples in the dataset, some of the images from the validation set were used to participate in the test.You mentioned the issue of hyperparameter settings, we have selected the hyperparameters of the original YOLOv5s by default and explained the experimental parameter settings in Table 2.

Reviewer 5 Report

The response should be better considered in the paper text [x] . So a minor revision is needed.

Minor editing of English language required

Author Response

Question 1. The response should be better considered in the paper text [x]. So a minor revision is needed.

Response:Thank you very much for the reviewer's reminder, and we will fully consider the reviewer's comments.We collated the responses of the six reviewers and selectively added some content to the manuscript, with changes or additions to the manuscript highlighted in yellow.

Question 2. Minor editing of English language required.

Response:Thank you for your advice. We have tried our best to check and revise the whole language and correct the grammar. Once again, we invited a friend who is good at English and knows this field to check and correct the language of the whole paper. We sincerely hope that the revised manuscript will be accepted.

Reviewer 6 Report

My concerns are still not addressed. In addition, some issues should be further solved:

1. In terms of the contributions of this manuscript, the authors list four parts. However, these four components are proposed by other research works. So what is your key point of your manuscript

2. In the ablation experiments, there exist five combinations of the improvement strategies. However, for three improvement strategies, eight combinations of the improvement strategies should be considered. It is strongly suggested that the authors added the other three combinations of the improvement strategies to prove the effectiveness of the proposed improvement strategies.

3. Some errors occur in the manuscripts, such as the citation of the references below in Figure 8. It is suggested that the authors check full text of the manuscript.

4. A Discussion Section should be added to discuss the advantages and shortcomings of the proposed detector.

Minor editing of English language required.

Author Response

Question1.In terms of the contributions of this manuscript, the authors list four parts. However, these four components are proposed by other research works. So what is your key point of your manuscript

Response:We strongly agree with the reviewer that the improvements to YOLOv5 are all based on the introduction of existing modules. We also tested about 40 kinds of experiments in the early stage, and the training time of each experiment is about 18 hours. Some of the improvement methods include: Add different attention mechanism, replace loss function, improve non-maximum suppression, replace activation function, improve upsampling, add small target layer, replace backbone network Swin Transformer, etc. We have found that not all improvements lead to improved performance. We believe that the main innovation lies in how existing modules are selected and applied to achieve good detection results. Specifically, it is to add existing modules to different network locations of YOLOv5 to achieve a better detection effect.In addition, we proposed a median + bilateral filter method for noise reduction to reduce the interference of water ripples and waves and to highlight the ship feature information.

Question2.In the ablation experiments, there exist five combinations of the improvement strategies. However, for three improvement strategies, eight combinations of the improvement strategies should be considered. It is strongly suggested that the authors added the other three combinations of the improvement strategies to prove the effectiveness of the proposed improvement strategies.

Response:We sincerely thank the reviewer for their careful questions.In the early ablation experiment, we also conducted separate experiments on CBAM, BiFPN+GSConv and W-IoU modules, but finally found that the experimental results of each module did not improve compared with the original YOLOv5s, and significantly decreased.Therefore, we only show the experiments that have improved compared with the original YOLOv5s in Table 3.Detailed results of individual experiments for each module are shown in the table below.

Improvement Strategy

AP%

mAP_0.5

%

FPS

f/s

Large-size

Medium-size

Small-size

CBAM

BiFPN+GSConv

WIoU

CS

DCS

LCS

PS

WS

ES

SC

FB

TB

MB

 

 

 

88.4

93.4

88.7

73

97.6

47.4

56.5

76.7

59

68

74.9

69

 

 

89.2

91.8

75.9

67

36.6

22.3

35.8

9.3

38.5

63.9

53

51

 

 

88.2

92.4

78.8

82.9

34.2

22.3

41.9

19.9

42.4

65.9

56.9

46

 

 

87

93.7

76.6

78.2

33.6

26.5

30.5

24.6

53.9

69.8

57.4

62

89.9

93.9

87.9

82.9

96

59.3

61.3

82.7

58.8

68.6

78.1

75

Question3.Some errors occur in the manuscripts, such as the citation of the references below in Figure 8. It is suggested that the authors check full text of the manuscript.

Response:We sincerely thank the reviewer for careful checking. We feel sorry our careless mistakes. As suggested by the reviewer, we have checked all references and made consistent modifications. Please check the revised manuscript for details of the modification.

Question4.A Discussion Section should be added to discuss the advantages and shortcomings of the proposed detector.

Response: Thank you very much for the reviewer's valuable opinions. Through your suggestions, we have added the advantages and disadvantages of the improved detector to Section 6 Conclusions part.Please check the revised manuscript for details of the modification.

Question 5.Minor editing of English language required.

Response:Thank you for your advice. We have tried our best to check and revise the whole language and correct the grammar.Once again, we invited a friend who is good at English and knows this field to check and correct the language of the whole paper. We sincerely hope that the revised manuscript will be accepted.

Back to TopTop