Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Human Trajectory Prediction Based on a Single Frame of Pose and Initial Velocity Information

Electronics 2025, 14(13), 2636; https://doi.org/10.3390/electronics14132636

by Yucheng Huang^*

and Hong Yan

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2025, 14(13), 2636; https://doi.org/10.3390/electronics14132636

Submission received: 27 May 2025 / Revised: 23 June 2025 / Accepted: 26 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Recent Progress in Visual AI: Architectures, Learning, and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes new RNN for human trajectory prediction based on only single frame. This work is interesting as predicting such trajectory from only single frame is novel. I consider that using NNs in such use case is very beneficial. I found their novelties to be important in this field. Laos, the research questions and goals are well-described in the article. Figures and Tables are clear and visible. I do not find some major weaknesses of the paper.

I can suggest several minor modifications:

Is number 1 neccessary next to Authors' names if both are from the same Department?
Elaborate more why, among many NNs, you used specifically RNN.
Elaborate a bit more how is your loss function different from others, as I found it very important.
Dot is missing in Line 75. Also, "e" should be capital in Line 252 in 4. Experimental settings. Revise the whole manuscript for possible spelling/grammar mistakes.
Line 322 - check if "the" should be used with Figure.
Can references be improved with some more up-to-date works from past few years? Maybe some similar use of RNNs.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript presents a neural network-based framework for human trajectory prediction using only a single frame of pose information and initial velocity. This minimalist approach stands in contrast to the dominant paradigm of multi-frame temporal input, and the authors report competitive or superior performance on benchmark datasets. While the proposed method is interesting and technically detailed, the manuscript requires major revisions to clarify its assumptions, more rigorously position itself against prior work, and improve the presentation in key areas.

The paper claims to be the first to predict human trajectory from a single frame and velocity. However, related research has previously addressed motion intent estimation from static pose, pose-conditioned future sampling and GAN-based or probabilistic single-timestep predictions. There is no mention of Social-LSTM, Trajectron++ or PECNet in this paper. These efforts, while not directly identical, show that low-input trajectory forecasting is not entirely unexplored. Reframe the contribution to clarify what is genuinely novel — e.g., the specific dual-GRU architecture, the quaternion-based loss structure, or the inference-time constraints. The current phrasing may mislead readers.
The paper emphasizes that no past motion is needed — yet also assumes that velocity is available at inference. This creates a contradiction: velocity cannot exist without history unless it is inferred from the pose itself (e.g., body orientation) or it is externally provided (e.g., sensors, IMU). Explain in clear terms how the velocity is obtained at test time, whether this velocity is ground truth or estimated and how the approach generalizes to real-world settings without full MoCap data.
The architecture is well described, but the actual I/O specification is missing or scattered. It's unclear what precise format the "single frame" pose uses (joint positions? quaternions?), whether velocity is 2D, 3D, or relative to local orientation. What the model ultimately outputs (e.g., root joint path, full body pose trajectory)? It would be beneficiary to include a clear table or figure summarizing Inputs (modalities, formats, dimensions), Outputs (coordinates, angles, sequence length) and inference-time vs training-time conditions.
While technically interesting, the loss function section needs to be made more accessible by reducing mathematical density and walking the reader through the intuition behind the formulation.
Were baseline models trained using the same pose+velocity inputs? This affects the validity of comparisons. Add the explanation in manuscript as well.
A short section discussing assumptions (e.g., pose quality, sensor availability) and failure cases should be added to Limitations Section.
Velocities and distances are discussed without clear units (m/s, frames/s?). Ensure all distances and velocities include appropriate units.
There are some minor technical errors and typos that should be addressed:

There is a formatting inconsistency in Section 4, the title “experimental settings” (line 252) should be capitalized as “Experimental Settings” to maintain consistency with all other section titles.
Duplicate joint index in line 172: “joints 5, 6, 7, 8, and 8” → likely intended as “and 9”.
Typos like “quarternion” should be corrected to “quaternion”.
Sentence structure in several sections is somewhat confusing and should be broken into clearer statements.
Figures (especially Figure 4 and Figure 6) would benefit from more explanatory captions to improve standalone clarity. The detailed explanations can be part of the section and not the figure caption.

This work presents an intriguing approach and has the potential to contribute to the addressed field of research. However, major revision of the paper is necessary before publication.

Comments on the Quality of English Language

There is a formatting inconsistency in Section 4, the title “experimental settings” (line 252) should be capitalized as “Experimental Settings” to maintain consistency with all other section titles.
Duplicate joint index in line 172: “joints 5, 6, 7, 8, and 8” → likely intended as “and 9”.
Typos like “quarternion” should be corrected to “quaternion”.
Sentence structure in several sections is somewhat confusing and should be broken into clearer statements.
Figures (especially Figure 4 and Figure 6) would benefit from more explanatory captions to improve standalone clarity. The detailed explanations can be part of the section and not the figure caption.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Human Trajectory Prediction Based on a Single Frame of Pose and Initial Velocity Information

Further Information

Guidelines

MDPI Initiatives

Follow MDPI