Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

FusionTrack: Multiple Object Tracking with Enhanced Information Utilization

Appl. Sci. 2023, 13(14), 8010; https://doi.org/10.3390/app13148010

by Yifan Yang^1,†

, Ziqi He^1,†

, Jiaxu Wan^2,†

, Ding Yuan²

, Hanyang Liu², Xuliang Li² and Hong Zhang^2,*

Reviewer 1:

Niranjana Sampathila

Reviewer 2:

Jongseong Brad Choi

Reviewer 3: Anonymous

Appl. Sci. 2023, 13(14), 8010; https://doi.org/10.3390/app13148010

Submission received: 30 May 2023 / Revised: 24 June 2023 / Accepted: 1 July 2023 / Published: 8 July 2023

(This article belongs to the Special Issue Intelligent Analysis and Image Recognition)

Round 1

Reviewer 1 Report

1. Give details oo Figure 1a, 1b and 1c

2. Section three may require 2 -3 introductory statements before section 3.1.

3. Section 3 may be labeled with Methodology or Materials and methods Section 3.2 may be Fusion track instead Overview

4. Section 4 may begin with a few important details on experiments before section 4.1

5. Include a result based highlight in conclusion

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

General comments

The paper proposes FusionTrack method for MOT. My general concern for the paper is that it falls short in providing a comprehensive and detailed explanation of the proposed methods and their advantages over existing methods. The evlauation of the proposed model is limited to a single data set hence raises questions about the generalizability of the research and paper needs to have major revisions to address these issues.

Detailed coments

The paper lacks a clear and logical structure. The sections are not separated clearly and it was very difficult for the readers to follow. It would be nice if the author could provide more details on the implementation specifics of the FusionTrack model. In Results, a more analytical comparisons with existing method would be clear for the readers instead of just wording the statistical data into a paragraph.

The paper limit its evaluation to a DanceTrack. This raises questions about the model's performance on other datasets with different characteristics, the concern of robustness and generalizability of the model comes to my mind.

The paper also fails to discuss the potential limitations of the proposed model and future work. A more balanced discussion and acknowledging limitations would improve the paper quality.

Suggestions

l Seperate the sections into logical structure. Currently it is Introduction, Related Works, FusionTrack, Experiments, Analysis and Discussion, Conclusions. Instead, it should be e.g.: Introduction, Related Literature, Methodology, Validaiton, Analysis and Discussion, Conclusion (not 'conclusions'). And other suggested sections would be Limitations, and future works.

l More detailed explanations of the underlying principles. Also the rationale behind these design choices should be justifiable.

l Discuss potential limitations of the model and future work. The conclusion should be suported by the results presentated in the article or referenced in secondary literature.

l Moreover, the one of the major problem of the paper lies in awkward phrasing and running sentences (sentences with two or more independence clauses merged into single complex sentences). This makes readers very hard to follow the paper.

l Limiting usage of “we”. Instead, you could say “this paper”, “this research”, etc.

l Proper figure and table titles. Figure 4 “We use an...” That is not a figure title. Make the titles concise.

Here are some improvements the author can make:

l Line 10: "We introduce FusionTrack to address these issues ... between frames." Consider breaking this and similar senetence structures into frames. e.g.: “We introduce FusionTrack to adresses these issues. It utiliizes ... between frames.”

l Line 31: “...well thanks to...” is not professional. I suggest going with “ due to” or “because of the”. The author should check smilar mistakes (casual phrasing) through the paper.

l Line 35: “However, some things could be improved... scenes.”. “Some things” is not professional to say. I would suggest to specify the context. e.g.: “However, there are still some areas for improvements in the xisting TBA methods ...”.

l Line 38-39: “... a negative correlation between...”. I believe you mean “trade-off between”.

l Line 49: Same issue as line 10.

l Line 67: “We proposed a joint track-detection decoder... for both parts.”. “Both parts” is also not prefessional. It would be better to specify what these parts are. e.g.: “We proposed a joint track-detection decoder to fuse information between the track and detection branches, thereby improving the performance of both these components.” There are other similar mistakes in the paper.

l Line 76-79: “The goal of... just one object”. MOT is explained twice. Once at the start, and again after “while”. It is redundant to explain same concept over.

l Line 294: “Propagate track queries” to “Propagate the track queries”

l Line 295-297: The sentence can be split.

l Line 306- 307: “The appearance features and two embeddings... the next frame” The sentence is very vague for vthe readers. The sentence can be improved by explaining what “two embeddings” refer to. e.g.: The appearance features, along with the confidence and delta time embeddings, are concatenated and then passed through an MLP to obtain track queries for the subsequent frame.”

l Line 366-367: “We randomly sample... and flip augmentations”. It would be to focus on the object, instead “we”, the researchers, performing the action. e.g.”For DanceTrack, four frames are randomly sampled from a ten frame interval around the key image...”. Also “random resize, crop hue and flip augmentations.” should be gerunds (verb+ “ing” because they are being used as nouns). e.g.: “apply random resizing, cropping, hue adjustments, and flipping are applied.” I have seen similar mistakes throughout the paper.

l Table 2. Remove (ours). It is redundant.

l Line 443: “The upper part of the figure... change across frames”. “Upper part” is not professional. Instead the author could say ”The upper segment of the figure ilustrates...”. Many other sections of the paper can be improved similarly. The paper stops at being just grammatically correct. However, I suggest more usage of expressive language and tones. It surely will improve the quality of the paper, make readers better understand, and reduce boredom.

...and similar awkward phrasing and errors throughout the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper addresses the challenges in multi-object tracking (MOT) within computer vision. While existing methods perform well in simple tasks like pedestrian tracking, they struggle with complex scenarios involving objects with diverse motion and uniform appearances. Inspired by the DETR approach, the paper introduces the tracking-by-attention (TBA) method that utilizes Transformers for MOT. However, TBA methods face several issues in stage scenes, including difficulties in object detection and tracking due to gradient conflicts in shared parameters, inaccurate detection of objects with complex motion, and insufficient utilization of features to distinguish similar objects.

1. The experimental evaluation on the DanceTrack dataset demonstrates a significant improvement of 10.2% in the HOTA metric compared to the baseline model MOTR, However, the paper lacks a comparison with other state-of-the-art methods in the field, which is essential to evaluate the competitiveness of FusionTrack.

2. Add more details on the implementation and architectural design of FusionTrack would be beneficial to understand the methodology thoroughly.

3. The paper could provide more context on the significance of stage scenes in multi-object tracking and how addressing the challenges in these scenes contributes to the field.

4. It would be valuable to discuss the computational complexity and runtime performance of FusionTrack to assess its practical applicability.

5. The conclusion could provide insights into the broader impact of FusionTrack beyond stage scenes and suggest potential future directions for further improvements or extensions of the proposed method.

Overall, the paper presents a promising solution, FusionTrack, for enhancing multi-object tracking through improved information utilization. Addressing the mentioned points would strengthen the paper and provide a more comprehensive evaluation of FusionTrack's contributions and performance in comparison to other state-of-the-art methods.

Minor editing of English language required.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors responded all of my comments and satisfied in the modified manuscript.

Article Menu

FusionTrack: Multiple Object Tracking with Enhanced Information Utilization

Further Information

Guidelines

MDPI Initiatives

Follow MDPI