Review Reports - ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article introduces ShipMOT, a customized multi-object tracking framework specifically designed for ship tracking in marine environments. At its core, ShipMOT employs a CNN-based detector to identify ships within radar images. Following detection, a novel Nonlinear State Augmentation (NSA) filter is utilized as the motion prediction model. This filter generates more accurate ship trajectories that closely align with actual movement patterns, signif-icantly reducing errors typically caused by linear prediction models. Furthermore, instead of relying on the conventional Intersection over Union (IoU) metric, ShipMOT adopts a refined Bounding Box Similarity Index (BBSI). This index incorporates assessments of both shape similar-ity and center point consistency, thereby minimizing incorrect ID associations, especially in scenarios involving dense and intersecting ship traffic.

However, there are a few things that need to be made:

In the abstract there are personal forms, "we did". The abstract should be in an impersonal form.
The abstract is very extensive and should be shortened to the most important information.
In the abstract sentences: “In maritime supervision, the continuous and stable tracking of ship positions using marine radar is crucial for ensuring navigation safety. However, traditional target tracking methods often struggle to maintain stability in marine radar (MR) images due to the complex maritime environment, which is characterized by nonlinear ship movements, dense traffic, and intersecting trajectories.” should be deleted. In abstract should be main information about novelty i results in article.
To enrich the article, it would be worth presenting the methodology in a scheme.
In the research part there is Figure 5. I would suggest dividing it into more figures or adding a), b), c)....
Please expand and elaborate on the summary.

Once the changes have been made, the article may be considered for publication.

Comments on the Quality of English Language

The English is good but can be checked by a language specialist.

Author Response

Comments 1: In the abstract there are personal forms, "we did". The abstract should be in an impersonal form.

Response 1: Thank you for pointing this out. we have rewritten the abstract of this paper based on the feedback received.

Comments 2: The abstract is very extensive and should be shortened to the most important information.

Response 2: Thank you for your valuable feedback and suggestions regarding the abstract of our paper. We have rewritten the abstract of this paper based on the feedback received.

Comments 3: In the abstract sentences: “In maritime supervision, the continuous and stable tracking of ship positions using marine radar is crucial for ensuring navigation safety. However, traditional target tracking methods often struggle to maintain stability in marine radar (MR) images due to the complex maritime environment, which is characterized by nonlinear ship movements, dense traffic, and intersecting trajectories.” should be deleted. In abstract should be main information about novelty i results in article.

Response 3: Thank you for your valuable feedback and suggestions regarding the abstract of our paper. We have made the revisions according to your suggestions.

Comments 4: To enrich the article, it would be worth presenting the methodology in a scheme.

Response 4: Thank you for your valuable feedback and suggestions regarding the abstract of our paper. We have made the revisions according to your suggestions.

Comments 5: In the research part there is Figure 5. I would suggest dividing it into more figures or adding a), b), c)..

Response 5: Thank you very much.According to your suggestions, we have revised this figure and divided it into more figures.

Comments 6: Please expand and elaborate on the summary.

Response 6: Thank you for your suggestion. Based on your suggestion, relevant conclusions on the accuracy and real-time performance of the method proposed in this article have been added to the ending section, as well as the extension of the method in other fields.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The issues presented in the manuscript are relevant to the ship control process, especially when one want to ensure autonomy. But the description of problem and its solution, in my opinion should be improved, for better readability and understanding.

In the Literature review section in my opinion background about NSA filtering should be described. I suggest refering to the literature and reasoning of this choice.
Presented way of prediction and upted is incomplete. Please show the coefficients of the matrices and the way showing how they have been computed, in order to allow the process to be replicated by other researchers. Please show how transition matrix and process noise at time k are computed. Moreover state vector alements and covariance matrix construction should be described. Please present how Kalman gain is computed. Presented description of the methodology is in my opinion very simple and insufficient.
There is a need to present how targets are tracked, unfortunately presented figures do not give such a knowledge to the reader. There are only bounding boxes shown, but there is a need also to mark predicted trajectories.
In relation to lines 262-264 please give an example how it increases tracking accuracy? I recommend ensuring a metric which quantitatively will prove this increase.
I Fig.4 arrows an boxes are hard to identify. Please improve their visibility.
I suggest some improvements in Fig.5. In the firs row please add grid. Could you clarify why Classification and val Classification are empty figures? If they should remain as they are - please describe the reason, or delete them. I recommend adding a, b, c, etc. to each graph and describing them in the text body.
In Fig. 6 labels are very small and illegible. Please correct this issue.
In Section 4.2 you use metrics to provve your algorithm validity of use. These metrics, in my opinion, should be described. There is a need to present how these values are computed, therefore equations describing these formulas are needed.
According to Fig.7. Please describe brifely the mechanism of ID switching, why this phenomenon occurs, what kind of values this algorithm returns. In my opinion there is also a need to present these ships motion vectors.
In Fig.8 you present two separate algorithms. Please describe them both, ex. using pseudo code or algorithmic bloc diagram. There is also a need to show how do they work in this particular case, so a short description is needed. There is also a question what are the key points, that allow Ship MOT to keep ship ID in this particular case.
Conclusions are very limited. Please expand them.

Comments on the Quality of English Language

In the sentence in L.72-73 there are grammar errors. In line 38 "typically" is repeated. And there are several "-" in the middle of words, ex. L11, 14, 35, 293. In L. 183 is unnecesary space sign.

Author Response

Comments 1:In the Literature review section in my opinion background about NSA filtering should be described. I suggest refering to the literature and reasoning of this choice.

Response 1: Thank you for pointing this out. We have added section 2.3 titled "Nonlinear State Augmentation (NSA) Filter" to the literature review.

Comments 2: Presented way of prediction and upted is incomplete. Please show the coefficients of the matrices and the way showing how they have been computed, in order to allow the process to be replicated by other researchers. Please show how transition matrix and process noise at time k are computed. Moreover state vector alements and covariance matrix construction should be described. Please present how Kalman gain is computed. Presented description of the methodology is in my opinion very simple and insufficient.

Response 2: Your suggestion has been very helpful. How transition matrix and process noise at time k are computed are added in L.232-240. How Kalman gain is computed are added in L.251-260.

Comments 3: There is a need to present how targets are tracked, unfortunately presented figures do not give such a knowledge to the reader. There are only bounding boxes shown, but there is a need also to mark predicted trajectories.

Response 3: Thank you for your valuable suggestions. We have present how targets are tracked in L.279-291.

Assuming that detection boxes from previous frames represent the ship's trajectory in those frames. Upon reaching the present frame, the ship's trajectory is initially predicted based on its past motion direction using filtering methods. Both NSA filtering and KF employ linear models for prediction, yielding a trajectory forecasted by the Kalman linear model. In the process of associating detection boxes with the predicted trajectories, the present frame's detection box and the predicted trajectory are fused and updated to obtain the final output trajectory of the ship, completing the tracking task. NSA filtering adjusts the trust weight of the detection boxes dynamically, which results in the output ship trajectory aligning more closely with the actual detection boxes. This dynamic adjustment allows NSA filtering to adapt to changes in the target's movement more effectively, as illustrated in Figure 2. Conversely, KF relies on a fixed update scale, meaning it tends to keep the output trajectory closer to the predictions made by its linear model.

Comments 4: In relation to lines 262-264 please give an example how it increases tracking accuracy? I recommend ensuring a metric which quantitatively will prove this increase.

Response 4: Thank you very much.We have conducted an ablation study to analyze the effects of BBSI on the algorithm's tracking performance. The results are detailed in Table 2, which quantitatively demonstrates the performance improvements brought about by incorporating BBSI.

Comments 5: Fig.4 arrows an boxes are hard to identify. Please improve their visibility.

Response 5: Your suggestions are highly valued. We have revised the figure by thickening the arrows and boxes, thereby enhancing its visibility.

Comments 6: I suggest some improvements in Fig.5. In the firs row please add grid. Could you clarify why Classification and val Classification are empty figures? If they should remain as they are - please describe the reason, or delete them. I recommend adding a, b, c, etc. to each graph and describing them in the text body.

Response 6: Thank you for your suggestion. According to your suggestions, we have revised this figure and divided it into more figures.

Comments 7: In Fig. 6 labels are very small and illegible. Please correct this issue.

Response 7: We appreciate your valuable feedback. To improve the visibility and overall quality of the figure, we have enlarged the image and replaced it with a clearer version where the boxes are more distinguishable.

Comments 8: In Section 4.2 you use metrics to provve your algorithm validity of use. These metrics, in my opinion, should be described. There is a need to present how these values are computed, therefore equations describing these formulas are needed.

Response 8: Your suggestions is incredibly useful. We have introduced and explained the evaluation metrics along with their relevant formulas in L. 481- 497.

Comments 9: According to Fig.7. Please describe brifely the mechanism of ID switching, why this phenomenon occurs, what kind of values this algorithm returns. In my opinion there is also a need to present these ships motion vectors.

Response 9: Thank you very much for your suggestion. We have provided an explanation of the ID switching mechanism specific to nonlinear ship motion tracking in L. 597- 613.

Comments 10: In Fig.8 you present two separate algorithms. Please describe them both, ex. using pseudo code or algorithmic bloc diagram. There is also a need to show how do they work in this particular case, so a short description is needed. There is also a question what are the key points, that allow Ship MOT to keep ship ID in this particular case.

Response 10: Your suggestions is very useful. We have introduced the flow of ByteTrack and explained how ShipMOT maintains ship IDs in this figure in L. 624-644.

Comments 11: Conclusions are very limited. Please expand them.

Response 11: Thank you for your suggestion. Based on your suggestion, relevant conclusions on the accuracy and real-time performance of the method proposed in this article have been added to the ending section, as well as the extension of the method in other fields.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents ShipMOT, a novel multi-object tracking framework designed for ship tracking in marine radar images. By integrating a CNN-based detector with a Nonlinear State Augmentation (NSA) filter and a Bounding Box Similarity Index (BBSI) for improved data association, ShipMOT aims to enhance tracking accuracy in challenging maritime environments. The experimental evaluation on the Radar-Track dataset demonstrates its superior performance compared to existing tracking algorithms. However, while the paper introduces valuable innovations, there are areas where further clarifications, evaluations, and improvements can be made. Below are detailed minor and major technical comments to enhance the clarity, depth, and completeness of the study.

Minor Comments:
1. Clarify the specific advantage of NSA filtering over traditional Kalman filtering in scenarios with abrupt motion changes.
2. Specify how ShipMOT handles occlusions or temporary disappearances of ships in the radar images.
3. Include a discussion on potential computational overhead introduced by BBSI compared to IoU-based matching.
4. Explain how the proposed method deals with false positives in dense traffic conditions.
5. Justify why YOLOv7 was chosen as the detector over other advanced CNN-based architectures.
6. Provide more details on how data augmentation (e.g., random flipping, cropping) improves model performance in maritime scenarios.
7. Discuss the trade-offs between ShipMOT’s accuracy and real-time performance (e.g., 32.36 FPS).
8. Add a comparison of NSA filtering with other recent motion prediction techniques beyond Kalman filtering.
9. Elaborate on how the Radar-Track dataset was labeled to ensure high annotation accuracy.
10. Clarify if ShipMOT is adaptable to non-marine multi-object tracking applications.

Major Comments:
1. The paper lacks an explicit discussion on the limitations of ShipMOT. Address potential weaknesses, such as performance in highly cluttered environments or adverse weather conditions.
2. The ablation study primarily focuses on NSA filtering and BBSI. Include an additional study evaluating the impact of the detector’s architecture on overall tracking performance.
3. Provide more details on the hyperparameter tuning process used for ShipMOT to optimize its tracking performance.
4. The introduction claims ShipMOT significantly outperforms other methods, but statistical significance tests (e.g., confidence intervals) are not reported. Add a statistical analysis to support the claims.
5. The Radar-Track dataset is stated to contain challenging maritime conditions, but no quantitative analysis is provided regarding the diversity of scenarios (e.g., distribution of ship speeds, density levels). Include a dataset analysis.
6. The experimental comparison is missing a discussion on computational complexity. Report inference time per frame and GPU/CPU usage for a fair comparison with other trackers.
7. Provide more insight into how the bounding box similarity index (BBSI) is optimized and whether its parameters need tuning for different scenarios.
8. Expand the literature review to include recent advancements in transformer-based tracking methods and their relevance to maritime applications.
9. The visualization results of ShipMOT in different tracking scenarios are useful, but more failure cases should be analyzed to highlight areas for improvement.
10. The conclusion suggests future work but does not propose specific approaches to improving robustness against radar noise and environmental variations. Provide a concrete roadmap for future improvements.

Author Response

Minor Comments:

Comments 1: Clarify the specific advantage of NSA filtering over traditional Kalman filtering in scenarios with abrupt motion changes.

Response 1: Thank you for pointing this out. We have added dvantage of NSA filtering over traditional Kalman filteringin L.305-319 .

Comments 2: Specify how ShipMOT handles occlusions or temporary disappearances of ships in the radar images.

Response 2: Your suggestion has been very helpful. How ShipMOT handles occlusions or temporary disappearances of ships in the radar images are added in L.195-200.

Comments 3: Include a discussion on potential computational overhead introduced by BBSI compared to IoU-based matching.

Response 3: Thank you for your valuable suggestions. We have present computational overhead introduced by BBSI compared to IoU-based matching in L.541-549.

Comments 4: Explain how the proposed method deals with false positives in dense traffic conditions.

Response 4: Thank you very much.In dense ship traffic scenarios, multiple crossing ships can be approximated as overlapping navigation patterns. In frame 37, although the bounding boxes of Ship 7 and Ship 10 exhibit high IoU overlap, their length and width differences provide distinguishable characteristics. ShipMOT leverages length similarity and width similarity assessments of bounding boxes to effectively differentiate trajectories between ships. In frames 8 and 37, despite high length-width similarity between Ship 3 and Ship 4’s bounding boxes, their distinct movement paths create opportunities for trajectory distinction. This enables algorithms to exploit center-distance analysis between bounding boxes for separation. By employing discriminative center-distance criteria, the BBSI metric successfully distinguishes and accurately associates trajectories of ships with diver-gent movement paths.

Comments 5: Justify why YOLOv7 was chosen as the detector over other advanced CNN-based architectures.

Response 5: Your suggestions are highly valued. YOLOv7 employs an Extended Efficient Layer Aggregation Network (EELAN) to achieve effective feature extraction for small ship targets. Additionally, relying on a single-stage detection architecture, YOLOv7 offers faster operational capabilities compared to two-stage detection algorithms. So we chose YOLOv7 as the detector.

Comments 6: Provide more details on how data augmentation (e.g., random flipping, cropping) improves model performance in maritime scenarios.

Response 6: Thank you for your suggestion. According to your suggestions, we have provide more details on how data augmentation.

In varying weather conditions such as clear, rainy and foggy days, MR images often exhibit varying tonal levels. Data augmentation techniques based on white balance adjustments can be employed to enrich radar image datasets with diverse weather scenarios, thereby enhancing the generalization capabilities of detection models. Random cropping involves extracting training samples from localized image regions. By cropping regions at different scales, the model becomes adept at recognizing multi-scale object characteristics, thereby simulating situations where ships may be partially occluded or only partially visible. Random flipping, which horizontally or vertically mirrors images, simulates targets appearing in different orientations within the radar field of view, enabling the model to adapt to ships with varying trajectory directions. Overall, data augmentation techniques improve detection model generalization by focusing on intrinsic object features while reducing overfitting risks.

Comments 7: Discuss the trade-offs between ShipMOT’s accuracy and real-time performance (e.g., 32.36 FPS).

Response 7: We appreciate your valuable feedback. Based on your suggestion, relevant conclusions on the accuracy and real-time performance of the method proposed in this article have been added to the ending section, as well as the extension of the method in other fields.

Comments 8: Add a comparison of NSA filtering with other recent motion prediction techniques beyond Kalman filtering.

Response 8: Your suggestions is incredibly useful. We have added a comparison of NSA filtering with other motion prediction in Table 3 and L. 554-573.

Comments 9: Elaborate on how the Radar-Track dataset was labeled to ensure high annotation accuracy.

Response 9: Thank you very much for your suggestion. We have provided an explanation of how the Radar-Track dataset is labeled to ensure high annotation accuracy in L. 416-432.The specific workflow is illustrated in a new figure.

Comments 10: Clarify if ShipMOT is adaptable to non-marine multi-object tracking applications.

Response 10: Your suggestions is very useful. The perspective of radar imagery resembles an overhead view of a scene. ShipMOT demonstrates potential for application in bird’s-eye view monitoring scenarios such as traffic surveillance from an aerial drone perspective. While this research does not conduct direct experiments on multi-object tracking for non-marine targets, ShipMOT can effectively track targets under similar monitoring perspectives to MR images. Future researchers can build upon this methodology to explore ShipMOT’s tracking capabilities in bird’s-eye view scenarios like aerial drone-based traffic monitoring.

Major Comments:
Comments 1: The paper lacks an explicit discussion on the limitations of ShipMOT. Address potential weaknesses, such as performance in highly cluttered environments or adverse weather conditions.

Response 1: Thank you for pointing this out. ShipMOT has certain limitations. First, the NSA filter relies on high-precision localization information from detection bounding boxes. If the localization quality of ship detection data is low, it significantly impacts the prediction accuracy of the NSA filter. Additionally, ShipMOT adopts a two-stage framework for ship target tracking, which requires running two separate algorithmic components, increasing computational complexity. Furthermore, the detection stage cannot leverage tracking information, reducing info-mation utilization efficiency.

Comments 2: The ablation study primarily focuses on NSA filtering and BBSI. Include an additional study evaluating the impact of the detector’s architecture on overall tracking performance.

Response 2: Your suggestion has been very helpful. To investigate the impact of detectors on tracking performance, YOLOv5 and SSD algorithms are trained on the same dataset as the ShipMOT detector, using identical training parameters and epochs. Subsequently, these two comparative algorithms then replace the detector component in ShipMOT, thereby showcasing how detector performance influences the overall tracking efficacy of the framework. The output is shown in Table 4.

Comments 3: Provide more details on the hyperparameter tuning process used for ShipMOT to optimize its tracking performance.

Response 3: Thank you for your valuable suggestions. ShipMOT’s hyperparameters require no manual tuning under normal conditions, unless applied to customized BBSI scenarios.

Comments 4: The introduction claims ShipMOT significantly outperforms other methods, but statistical significance tests (e.g., confidence intervals) are not reported. Add a statistical analysis to support the claims.

Response 4: Thank you very much.We have employed five comparative tracking algorithms to compare against ShipMOT, in order to demonstrate the superiority of ShipMOT.

Comments 5: The Radar-Track dataset is stated to contain challenging maritime conditions, but no quantitative analysis is provided regarding the diversity of scenarios (e.g., distribution of ship speeds, density levels). Include a dataset analysis.

Response 5: Your suggestions are highly valued. We have added a new figure and conducted a statistical analysis of ship positions and sizes in the radar dataset in L. 386-397.

Comments 6: The experimental comparison is missing a discussion on computational complexity. Report inference time per frame and GPU/CPU usage for a fair comparison with other trackers.

Response 6: Thank you for your suggestion. According to your suggestions, we have conducted a comparison of the tracking speeds of different trackers in Table. 1 and L. 523-526.

Comments 7: Provide more insight into how the bounding box similarity index (BBSI) is optimized and whether its parameters need tuning for different scenarios.

Response 7: We appreciate your valuable feedback. We have added relevant descriptions.

In general, BBSI's parameters require no manual tuning, simplifying its application in data association costs. For customized scenarios, if further optimization is needed, weights can be assigned to metrics like IoU overlap, center consistency, length similarity, and width similarity to emphasize specific dimensions in bounding box matching.

Comments 8: Expand the literature review to include recent advancements in transformer-based tracking methods and their relevance to maritime applications.

Response 8: Your suggestions is incredibly useful. We have introduced transformer-based tracking methods and their relevance to maritime applications in L. 101-118.

Comments 9: The visualization results of ShipMOT in different tracking scenarios are useful, but more failure cases should be analyzed to highlight areas for improvement.

Response 9: Thank you very much for your suggestion. We have provided an explanation of how we handle false positives in dense traffic conditions for ship tracking in L. 654-663.The specific workflow is illustrated in a new figure.

Comments 10: The conclusion suggests future work but does not propose specific approaches to improving robustness against radar noise and environmental variations. Provide a concrete roadmap for future improvements.

Response 10: Your suggestions is very useful. The experiments primarily concentrate on harbor scenarios. Future work will explore ship multi-object tracking in inland waterway environments. While the current research has not yet implemented targeted enhancements to the detector, upcoming studies plan to integrate Vision Transformer modules into detection algorithms. Future efforts will also emphasize broadening the diversity of radar image scenarios. Additionally, subsequent research will consider incorporating visual detection interference factors into MR images to enhance the model's robustness and generalization capabilities.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Literature review was extended and in my opinion my be treated as complete. In the present version of manuscript flowchart of the ShipMOTm framework is clear and describes an algorithm in a good, simple way. Metrics and the way of parameters computation is presented. The authors addressed all the reviewer's comments, made changes that significantly improved the quality of the manuscript.

I have only few minor remarks, mainly concerning editorial errors:

In L. 221, 224, 268, 255 and 256 please delete unnecesary space signs. Especially we don't put space before punctuation marks.
In eq. (20), number of equation should be placed together with the equation formula in L. 350, instead of L.351.
I recommend placing whole Table 3 on one page.
I suggest placing Section 5 title on the next page.

Comments on the Quality of English Language

Please have a look at punctuation marks and grammar issues. Ex. in L.221 there should be used "assumed" instead of "assume".

Author Response

Comments 1:In L. 221, 224, 268, 255 and 256 please delete unnecesary space signs. Especially we don't put space before punctuation marks.

Response 1: Thank you for pointing this out. We have already made modifications.

Comments 2: In eq. (20), number of equation should be placed together with the equation formula in L. 350, instead of L.351.

Response 2: Your suggestion has been very helpful. We are sorry for our mistake.

Comments 3: I recommend placing whole Table 3 on one page.

Response 3: Thank you for your valuable suggestions. We have made the revisions as advised.

Comments 4:I suggest placing Section 5 title on the next page.

Response 4: Thank you very much.we have updated our content accordingly.