Review Reports - Multi-Type Ship Target Detection in Complex Marine Background Based on YOLOv11

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Is the problem of multi-type ship detection in complex marine backgrounds clearly stated?
How well does the introduction convey the importance and relevance of this problem?
Does the paper provide a comprehensive review of existing methods for ship detection in marine environments?
Are the limitations of previous methods clearly identified and explained?
Why was YOLOv11 chosen or developed for this task?
How does YOLOv11 theoretically or practically improve upon previous YOLO versions or other detection algorithms?

Is the architecture of YOLOv11 thoroughly described, including any novel modifications or enhancements?
Are there diagrams or illustrations that aid in understanding the model architecture?
Are the specific modifications from previous YOLO versions clearly explained and justified?
Does the paper detail the training process, including data preprocessing, augmentation, and hyperparameter settings?
How does YOLOv11 handle feature extraction differently from other models?
Are the benefits of these feature extraction techniques demonstrated or discussed?

Is the dataset used for training and testing adequately described (e.g., size, diversity, complexity)?
Are the data splits (training, validation, testing) clearly outlined?
Are appropriate metrics (e.g., precision, recall, F1-score, mAP) used to evaluate the model's performance?
Are the results presented in a clear and understandable manner?
Does the paper include comparisons with other state-of-the-art methods or previous YOLO versions?
Are the comparative results statistically significant and well-justified?

Are visual examples of detection results provided to illustrate the model's performance?
Do these examples effectively highlight the strengths and potential weaknesses of YOLOv11?

Author Response

Comments 1: Is the problem of multi-type ship detection in complex marine backgrounds clearly stated?

How well does the introduction convey the importance and relevance of this problem?

Response 1: Thank you for pointing this out. In the introduction, the importance of maintaining maritime safety, smooth maritime traffic and orderly operation is further expounded, and the importance of real-time detection and effective interpretation of maritime ships as important carriers is introduced. By elaborating the complexity of marine environment, the difficulties and key points of ship detection are introduced. By analyzing the shortcomings of traditional ship target detection methods and artificial intelligence-based methods, the improved methods for multi-type ship detection in complex environment are introduced in this paper.

Comments 2: Does the paper provide a comprehensive review of existing methods for ship detection in marine environments?

Are the limitations of previous methods clearly identified and explained?

Response 2: Thank you for pointing this out. In the introduction, the classification, advantages and disadvantages and limitations of traditional ship target detection methods are further elaborated, and the deep learning-based target detection algorithms are analyzed and compared to analyze the limitations of the methods in realizing multi-type ship detection in complex environments.

Comments 3: Why was YOLOv11 chosen or developed for this task?

Response 3:Thank you for pointing this out. In the introduction, the characteristics of YOLOv11 network detection accuracy and speed are outlined, and the model is lighter. In the second part, a variety of deep learning-based target detection algorithms are compared, and the outstanding performance of YOLOv11 network detection accuracy, speed and network structure are expounded in detail. Compared with previous algorithms, YOLv11 has stronger performance. It can meet the real-time and accurate detection requirements of various types of ships at sea.

Comments 4: How does YOLOv11 theoretically or practically improve upon previous YOLO versions or other detection algorithms?

Is the architecture of YOLOv11 thoroughly described, including any novel modifications or enhancements?

Are there diagrams or illustrations that aid in understanding the model architecture?

Are the specific modifications from previous YOLO versions clearly explained and justified?

Response 4: Thank you for pointing this out. In the second part of this paper, the new improvement method of YOLOv11 network is described in detail from three aspects of YOLOv11 network structure, loss function design and network training. In the part of network structure, the improvement and optimization of YOLOv11 are described in detail from three parts: backbone network, feature fusion layer (Neck) and detection Head. Figure 3 to figure 12 show graphic description of each improvements and a comprehensive description of the entire YOLOv11 network. In the third part of this paper, the improvement of YOLOv11 is described in detail, and the comparison with the original module and method is explained to prove the effectiveness of the improved method, and multiple comparison diagrams and flow charts are used to explain.

Comments 5: Does the paper detail the training process, including data preprocessing, augmentation, and hyperparameter settings?

Response 5: Thank you for pointing this out. The fourth part of the paper is the experimental part, which elaborates the specific process from data set construction, environment construction to parameter setting.

Comments 6: How does YOLOv11 handle feature extraction differently from other models?

Are the benefits of these feature extraction techniques demonstrated or discussed?

Response 6: Thank you for pointing this out. The improved YOLOv11 introduced EfficientNetv2 model, and used CA attention mechanism to improve the EfficientNetv2 model, further improving the model's learning ability for multi-type target ship features in complex environments. The improved YOLOv11 constructs a new CCB (C3k2 ConvNeXt Block ) module by using the span information connection mode, which avoids the problem that the YOLOv11 network only focuses on the location of local pixels, thus improving the ability of the model to capture context information. The improved YOLOv11 adopts WIOU loss function, which can deal with the balance relationship between difficult and easy samples more effectively. Gradient gain is used to reduce the influence of harmful gradients while ensuring high quality anchor frame effect, which can improve the overall performance of ship detection model. In Chapter 3, the above improvements are described in detail and compared with the corresponding modules and methods in the original YOLOv11 network. In Chapter 4, the feasibility and effectiveness of the improvements are tested through ablation experiments.

Comments 7: Is the dataset used for training and testing adequately described (e.g., size, diversity, complexity)?

Are the data splits (training, validation, testing) clearly outlined?

Response 7: Thank you for pointing this out. In the part of data set construction in chapter 4, the types of ships in the data set, the number of different types of ships and the total number of pictures are elaborated, and the forms and pictures are displayed to describe the dataset more intuitively. The data set and the test set are divided into the images of the self-built data set in a ratio of 8:2.

Comments 8: Are appropriate metrics (e.g., precision, recall, F1-score, mAP) used to evaluate the model's performance?

Are the results presented in a clear and understandable manner?

Response 8: Thank you for pointing this out. This paper evaluated the model performance by comparing the precision, recall, mAP, and detection time of the network, and demonstrated the model more intuitively, clearly and easily through PR curves, mAP value comparison of various types of ships, and actual inspection effect drawings.

Comments 9: Does the paper include comparisons with other state-of-the-art methods or previous YOLO versions?

Response 9: Thank you for pointing this out. In the experimental part, the improved YOLOV11 method is compared with YOLOv8, YOLOv10, YOLOv11, R2-CNN and CenterNet (Hourglass) to verify the performance of the method.

Comments 10: Are the comparative results statistically significant and well-justified?

Response 10: Thank you for pointing this out. The experimental data are the average detection accuracy and time of all images. The performance difference between algorithms is not caused by random error.

Comments 11: Are visual examples of detection results provided to illustrate the model's performance?

Do these examples effectively highlight the strengths and potential weaknesses of YOLOv11?

Response 11: Thank you for pointing this out. In Chapter 4, the actual detection effect diagrams of detection scenes such as single target, occluding target, small target, multi-type target, and complex background are shown to prove the detection performance of the model in complex environments, and to prove the outstanding advantages of the improved YOLOv11 method in detection accuracy and speed.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors created and implemented a good method to detect Multi-type ships. The paper explained the technique in details. I do not have any questions about the technique. I have some minor revisions for the paper.

1. The figure resolutions (Figs 1,3,5,17 )are too low. When I zoom it, it is distorted. Please revise it and add higher resolution images.

2. Line 418 starts with: input image size of this experiment was 640640… I think it is better to write it as 640x640.

3. Line 533 should start with ‘Feature’ not ‘feature’.

4. Please revise Figure 22. Some images are not aligned.

5. The authors placed references at the end of the sentence after ‘dot’. Should it be placed before the dot? For example:

The subsequent improved networks proposed make the network model more streamlined, and the training and detection speed have been greatly improved. [8]

The subsequent improved networks proposed make the network model more streamlined, and the training and detection speed have been greatly improved [8].

Please check all references in the paper.

Author Response

Comments 1: The figure resolutions (Figs 1,3,5,17 )are too low. When I zoom it, it is distorted. Please revise it and add higher resolution images.

Response 1: Thank you for point this out. I have improved resolution of Figs 1,3,5,17.

Comments 2: Line 418 starts with: input image size of this experiment was 640640… I think it is better to write it as 640x640.

Response 2: Thank you for point this out. I have replaced 640640 with 640x640.

Comments 3: Line 533 should start with ‘Feature’ not ‘feature’.

Response 3: Thank you for point this out. I have replaced feature with Feature.

Comments 4: Please revise Figure 22. Some images are not aligned.

Response 4: Thank you for point this out. Figure 22 was modified to align the image.

Comments 5: The authors placed references at the end of the sentence after ‘dot’. Should it be placed before the dot? For example:

The subsequent improved networks proposed make the network model more streamlined, and the training and detection speed have been greatly improved. [8]

The subsequent improved networks proposed make the network model more streamlined, and the training and detection speed have been greatly improved [8].

Please check all references in the paper.

Response 5: Thank you for point this out. I have placed the reference before the "point" at the end of the sentence and checked all references in the paper.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Authors have presented accurate detection and identification of dynamic multi-type ship targets in the complex marine environment by YOLOv11. This manuscript seems to be interesting but authors need to address the following concerns:

1) Abstract needs to be rewritten as the wording needs to be improved.

2) Conclusion is too long. It should be made concise.

3) Comparison should be performed with the recent methods.

4) Authors should elaborate the challenges occurred as well as limitations of presented work .

5) Authors are suggested to consider the environmental factors as well.

6) Evaluation predictors description can be omitted as they are available in literature.

7) Pseudo-code should be provided in the manuscript.

8) Explain the PR curve in more detail.

Author Response

Comments 1: Abstract needs to be rewritten as the wording needs to be improved.

Responses 1: Thank you for point this out. I have rewritten the abstract, improved the wording, used more accurate words, and more smooth sentences .

Comments 2: Conclusion is too long. It should be made concise.

Responses 2: Thank you for point this out. The conclusions has been rewritten, removed duplicate statements, and emphasized the improved results of the algorithm to make the conclusions more concise.

Comments 3: Comparison should be performed with the recent methods.

Responses 3: Thank you for point this out. In this paper, the improved algorithm is compared with YOLOv11, YOLOv10 proposed in 2024 and YOLOv8 proposed in 2023, and compared with a typical network CenterNet.

Comments 4: Authors should elaborate the challenges occurred as well as limitations of presented work .

Responses 4: Thank you for point this out. In the introduction part and chapter 1, this paper makes a detailed comparative analysis of the traditional methods and the ship target detection algorithms based on deep learning, and expounds the application scenarios, existing problems and limitations of different methods. By analyzing the complexity of the current marine environment, the necessity and urgency of realizing accurate and efficient ship inspection are drawn out.

Comments 5: Authors are suggested to consider the environmental factors as well.

Responses 5: Thank you for point this out. In the part of data set construction, the environmental complexity is increased by introducing pictures of rain and fog weather and pictures of islands, reefs, coastal ports and docks.

Comments 6: Evaluation predictors description can be omitted as they are available in literature.

Responses 6: Thank you for point this out. I have omitted the evaluation predictors description

Comments 7: Pseudo-code should be provided in the manuscript.

Responses 7: Thank you for point this out. The Pseudo-code is provided after Figure 17.

Comments 8: Explain the PR curve in more detail.

Responses 8: Thank you for point this out. For the Figure 22, the meaning of PR curve is explained in more detail in the paper.