MSTFT: Mamba-Based Spatio-Temporal Fusion for Small Object Tracking in UAV Videos
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsFirst of all thank you for providing me a chance to review your paper. I have found some flaws in your paper. Those flaws are provided below in the form of pointers.
- Please include some results in the form of percentage in the last section of the abstract.
- The data that you have provided under the heading 2 of related works are basically the preliminaries, So you should take this data as a subsection in the introduction section under the heading preliminaries.
- And section 3 also seems like it should be included as preliminaries because, standalone this heading is making no sense in the organization of the article, for example either its methodology or what, please provide a clearer heading to the sections 2 and section 3 so that the reader may not get confused.
- And as for the related works section, you should provide a clearer one, where researcher have done works resembling to yours.
- At the end of the related works, you should provide a paragraph that should incapsulate the reasons, that has motivated you to do this work.
- Caption of figure 1 is too lengthy, please write it in a subtle way.
- Caption of figure 2 is too lengthy, please write it in a subtle way.
- Please increase the font size of the written things in figure 3 (manuscript and subscript), because in a normal zoom they are not clearly visible.
- You have used a good A100 GPU’s which is a very powerful hardware for tasks like yours, but in you article you have not in anyway claimed that how it will fit the its power/energy needs when you deploy this in a UAV based video tracking systems. Provide some results for the energy/ power requirements of your system.
- You have used a triple safety verification mechanism that definitely improves robustness but significantly increases system complexity too, please provide a comparison graph of the additional computational overhead that the system may have to face due to this approach, with other approaches that are used for safety verification.
- Your Loss Function Modification Needs Stronger Validation on individual bases, to assess their true impact.
- In you work, the data set that you used consists of small objects, to me it looks a little bit biased, because there may come scenarios where the data may consists of medium sized and large objects too, that needs to be tracked. Consider this and provide some results for it.
- You have not provided the pseudo code of your work in the study, please provide an appropriate pseudo code as algorithm. So that if the readers want to reproduce your work for their own analysis, so that they may do so.
- Memory consumption analysis by your opted GPU in a bidirectional scanning scenario is not provided in your study. Please provide its details in a tabular form.
- Also discuss what other options the research community may have, and a lil bit discussion on the cost of the hardware equipment too.
- You have provided almost no solid results for the failure cases analysis such as prolonged full occlusion or severe camera jitter. Provide some discussion and results for it please.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article presents a study of the problem of tracking small UAV objects. The research topic is relevant. The article fills a gap in existing research and proposes an improved MSTFT method that uses a Mamba-based spatio-temporal Fusion Tracker. In the study, the authors presented the mathematical basis of the proposed method. To evaluate the effectiveness of the proposed MSTFT method, a comparison with other modern methods is used. The proposed tracker demonstrates stability in complex scenarios where there are small targets, fast motion, overlap, and background interference. The article presents the results using Figures and Tables. Analysis of the research results demonstrates the effectiveness of the proposed MSTFT method. The list of references used has corresponding links in the text of the article. The authors presented a significant amount of research results, analyzed the results obtained, and identified future research.
The authors did not mention any limitations in the article. If the presented study has limitations, they should be mentioned in the Discussion section. The authors should also pay attention to the following comments:
At the end of the Introduction section, it is advisable to give a brief description of the content of the following sections of the article.
Line 165: To shorten the caption under Figure 1, move the sentence ‘It depicts the entire pipeline, beginning with …’ to the main text explanation.
Line 193: The text of the article should include a reference with an explanation to Figure 2 to the image of this Figure, as we see a reference to Figure 2 in line 231. The sentence “It depicts the inner spatial Mamba ...” can be moved from the caption under Figure 2 to the main text.
Lines 211-213: The parameter ‘ϵ’ from formula (3) needs to be explained.
Lines 275-287: The parameter ‘ϵ’ from formula (11) needs to be explained.
Lines 375-382: The parameter ‘N’ from formulas (19) and (20) needs to be explained.
Line 453: The text of the article needs to provide explanations for Figure 5 and Figure 6, explaining what the numbers in the captions under each graph in the figures mean, for example, 109 in the text ‘...Scale Variation (109)’.
Line 520: The text of the article needs to include a reference to Figure 7.
Line 535: The text of the article needs to include a reference to Figure 8.
Line 555: The text of the article needs to include a reference to Figure 9.
Line 579: A reference to Table 4 should be provided in the text of the article.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe article "MSTFT: Mamba-Based Spatio-Temporal Fusion for Small Object Tracking in UAV Videos" presents a significant research case. As a first step and as an intention, its authors have made important strides. The research is suitable for Electronics (ISSN 2079-9292) and is sure to be of interest to the journal's readers. Unfortunately, the authors have not avoided some typical errors in the construction of the article, which we must point out.
1) Please remove all abbreviations from the title and abstract. Abbreviations are used in the main text and after explanations have been given for each acronym. In addition, we would advise organizing a table of abbreviations to support the reader not only in studying the article but also in following its basic argumentation.
2) From the very first point, authors must adhere strictly to the academic framework governing scientific publications, starting with the abstract. It is advisable to reconsider the function of the Abstract as an introductory text and to remove information or data that is more appropriately presented in later sections of the article. The Abstract should be approximately ten lines long. In terms of content, it is recommended that a clear and coherent introductory text be formulated, which includes: (1) the context of the problem, (2) the purpose of the study, (3) the research questions, (4) the coded research objective, (5) a brief overview of the methodology, (6) the basic design tools, and (7) the main findings, results, and contribution of the study. All of the above should be formulated in concise, comprehensive, and highly focused sentences.
3) It is mandatory to explain the terms from the very beginning of the abbreviation. More specifically, we first give the explanation and then the abbreviation in parentheses. If this happens, it is mandatory not to rewrite the term and only use the abbreviation.
4) References should not be listed cumulatively or indiscriminately without clear justification for their use ([13–24].....[13–18].....[20–23] etc). The simultaneous juxtaposition of multiple references, without an explicit explanation of the reason for using each one, lacks academic documentation and creates confusion for the reader. Bibliographic references in a scientific article are not decorative elements, but tools for documenting specific claims, positions, or findings. Consequently, each reference must be used for a clear and specific purpose: to document a specific argument, to support a methodological choice, to define the theoretical framework, or to be directly linked to the research findings. It is not acceptable to leave the reader to "freely" interpret the reason for which a source is cited. Authors must explicitly explain why they refer to each work and exactly what contribution of it is used at each point in the article. This practice must be applied consistently throughout the work, correcting any instances of unclear or undocumented use of references.
5) The incomprehensible bombardment of references raises the next methodological question: if you want to organize a state-of-the-art, organize it; if you want to do an analytical field review, do it, but bombarding the reader with a disorganized series of references before they even start reading the article is not justified. Say what you want to do, explain the key parts of the research, and develop the argumentation of your case without excessive mass citations.
6) At the end of the introduction, authors should accurately add the scientific and research questions of the article, which will be answered in the conclusions. After the scientific and research questions, authors should cite the individual chapters of the article with a precise description and justification.
7) Organize it throughout the article: Where there are chapters and subchapters, start the development with a short introductory part that justifies the division. For example, in 2. Related Work, and in 2.1. UAV Visual Tracking, write a short introduction to the chapter.
8) A figure (Figure 1) with five lines of title cannot be accepted. Please fix. Move the text to the article documentation and leave the image with a simple description.
9) A figure (Figure 2) with five lines of title cannot be accepted. Please fix. Move the text to the article documentation and leave the image with a simple description.
10) It is not clear how Table 1 was created and how and with which tool Figure 4 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
11) It is not clear how Figure 5 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
12) It is not clear how Figure 6 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
13) It is not clear how Figure 7 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
14) It is not clear how Figure 8 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
15) It is not clear how Figure 9 was produced. Since this is the documentation of the article, please explain the selection tools in detail.
16) The conclusions cannot be accepted, in their current form, beyond the completely poor proportion to the entire text. In the conclusions, in addition to the answers to the scientific and research questions, the authors should note the overall added value of the article, the difficulties in implementing parts of the research, and finally, suggestions for future work.
The manuscript, as the subject of a laboratory test, contains exceptional field verifications. Unfortunately or fortunately, the formulation of the argumentation must also follow the level of the findings. Otherwise, the paper will be completely unjustified.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsWe accepted the authors' revised manuscript with great sincerity and great joy. The authors, with high scientific ethics and full technical training, made every effort to improve the manuscript. It is remarkable how an ambitious and broad text, through targeted changes and the addition of some supplementary experiments, was transformed into an excellent one. We owe thanks to the authors, who believed in the selflessness and honesty of our opinions and delivered us a text much better than we expected. This must be credited to the open policy of the journal and to the entire editorial team. Both the changes and the parts that were added enhanced the technical integrity of the work and scientifically and technically enriched its content. We, in turn, warmly appreciate the authors - researchers for their dedication and look forward to their next publication. They have now fully assimilated the criteria of a complete publication.
