A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology
Abstract
1. Introduction
- A comprehensive and multi-faceted review of deep learning-based pedestrian trajectory prediction models.
- A systematic categorization and summary of datasets and evaluation metrics used in pedestrian trajectory prediction.
- Insights into future research directions and key challenges in the field.
2. Related Work
2.1. Description of the Pedestrian Trajectory Prediction Problem
2.2. Introduction of the Main Methods of Pedestrian Trajectory Prediction
2.2.1. Introduction of Trajectory Prediction Method Based on Probability Statistical Models
2.2.2. Introduction to Trajectory Prediction Methods Based on Deep Learning Models
3. Analysis and Comparison of Pedestrian Trajectory Prediction Methods
3.1. Analysis and Comparison of Methods Based on RNN Model
3.2. Method Analysis and Comparison Based on GAN Model
3.3. Method Analysis and Comparison Based on GCN Model
3.4. Analysis and Comparison of Methods Based on Transformer Model
4. The Analysis of Multi-Modal Pedestrian Trajectory
5. Summary of Pedestrian Trajectory Prediction Methods
6. Datasets and Performance Evaluation
6.1. Dataset
6.2. Evaluation Indicators
6.3. Performance Comparison
7. Rethinking Future Directions
7.1. Semantic Scene Comprehension
7.2. Robust Integration and Transferability
7.3. Creating Comprehensive Benchmarks and Datasets
7.4. Adaptive Prediction with Deep Learning and Reinforcement Learning
7.5. Balancing Precision and Efficiency
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Luber, M.; Stork, J.A.; Tipaldi, G.D.; Arras, K.O. People tracking with human motion predictions from social forces. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 464–469. [Google Scholar]
- Dragan, A.D.; Ratliff, N.D.; Srinivasa, S.S. Manipulation planning with goal sets using constrained trajectory optimization. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 4582–4588. [Google Scholar]
- Kuderer, M.; Kretzschmar, H.; Sprunk, C.; Burgard, W. Feature-based prediction of trajectories for socially compliant navigation. In Robotics: Science and Systems VIII; The MIT Press: Cambridge, MA, USA, 2012; Volume 8. [Google Scholar]
- Mainprice, J.; Hayne, R.; Berenson, D. Goal set inverse optimal control and iterative replanning for predicting human reaching motions in shared workspaces. IEEE Trans. Robot. 2016, 32, 897–908. [Google Scholar] [CrossRef]
- Trautman, P.; Krause, A. Unfreezing the robot: Navigation in dense, interacting crowds. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 797–803. [Google Scholar]
- Ziebart, B.D.; Ratliff, N.; Gallagher, G.; Mertz, C.; Peterson, K.; Bagnell, J.A.; Hebert, M.; Dey, A.K.; Srinivasa, S. Planning-based prediction for pedestrians. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 11–15 October 2009; pp. 3931–3936. [Google Scholar]
- Rehder, E.; Wirth, F.; Lauer, M.; Stiller, C. Pedestrian prediction by planning using deep neural networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 5903–5908. [Google Scholar]
- Toroyan, T. Global status report on road safety. Inj. Prev. 2009, 15, 286. [Google Scholar] [CrossRef]
- Helbing, D.; Molnar, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282. [Google Scholar] [CrossRef]
- Pellegrini, S.; Ess, A.; Van Gool, L. Improving data association by joint modeling of pedestrian trajectories and groupings. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part I 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 452–465. [Google Scholar]
- Koppula, H.S.; Saxena, A. Anticipating human activities using object affordances for reactive robotic response. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 14–29. [Google Scholar] [CrossRef]
- Rudenko, A.; Palmieri, L.; Herman, M.; Kitani, K.M.; Gavrila, D.M.; Arras, K.O. Human motion trajectory prediction: A survey. Int. J. Robot. Res. 2020, 39, 895–935. [Google Scholar] [CrossRef]
- Xu, Y.; Piao, Z.; Gao, S. Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5275–5284. [Google Scholar]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
- Chen, Z. Bayesian filtering: From Kalman filters to particle filters, and beyond. Statistics 2003, 182, 1–69. [Google Scholar] [CrossRef]
- Lauritzen, S.L.; Spiegelhalter, D.J. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Ser. B Methodol. 1988, 50, 157–194. [Google Scholar] [CrossRef]
- Ferguson, S.; Luders, B.; Grande, R.C.; How, J.P. Real-time predictive modeling and robust avoidance of pedestrians with uncertain, changing intentions. In Algorithmic Foundations of Robotics XI: Selected Contributions of the Eleventh International Workshop on the Algorithmic Foundations of Robotics; Springer: Berlin/Heidelberg, Germany, 2015; pp. 161–177. [Google Scholar]
- Ghahramani, Z.; Jordan, M. Factorial hidden Markov models. Adv. Neural Inf. Process. Syst. 1995, 8, 1–8. [Google Scholar]
- Kooij, J.F.P.; Schneider, N.; Flohr, F.; Gavrila, D.M. Context-based pedestrian path prediction. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VI 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 618–633. [Google Scholar]
- Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. Mar. 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Nelder, J.A.; Wedderburn, R.W. Generalized linear models. J. R. Stat. Soc. Ser. A Stat. Soc. 1972, 135, 370–384. [Google Scholar] [CrossRef]
- Tay, M.K.C.; Laugier, C. Modelling smooth paths using gaussian processes. In Proceedings of the Field and Service Robotics: Results of the 6th International Conference, Chamonix, France, 9–12 July 2007; Springer: Berlin/Heidelberg, Germany, 2008; pp. 381–390. [Google Scholar]
- Wang, J.M.; Fleet, D.J.; Hertzmann, A. Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 283–298. [Google Scholar] [CrossRef]
- Williams, C.K. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In Learning in Graphical Models; Springer: Berlin/Heidelberg, Germany, 1998; pp. 599–621. [Google Scholar]
- Quinonero-Candela, J.; Rasmussen, C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. [Google Scholar]
- Rasmussen, C.E. Gaussian processes in machine learning. In Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 63–71. [Google Scholar]
- Akaike, H. Fitting autoregreesive models for prediction. In Selected Papers of Hirotugu Akaike; Springer: Berlin/Heidelberg, Germany, 1969; pp. 131–135. [Google Scholar]
- Jenkins, G.M.; Priestley, M. The spectral analysis of time-series. J. R. Stat. Soc. Ser. B Methodol. 1957, 19, 1–12. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Graves, A.; Jaitly, N. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1764–1772. [Google Scholar]
- Chorowski, J.; Bahdanau, D.; Cho, K.; Bengio, Y. End-to-end continuous speech recognition using attention-based recurrent nn: First results. arXiv 2014, arXiv:1412.1602. [Google Scholar]
- Chung, J.; Kastner, K.; Dinh, L.; Goel, K.; Courville, A.C.; Bengio, Y. A recurrent latent variable model for sequential data. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
- Graves, A.; Mohamed, A.r.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3156–3164. [Google Scholar]
- Karpathy, A.; Joulin, A.; Fei-Fei, L.F. Deep fragment embeddings for bidirectional image sentence mapping. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Yoo, D.; Park, S.; Lee, J.Y.; Paek, A.S.; So Kweon, I. Attentionnet: Aggregating weak directions for accurate object detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2659–2667. [Google Scholar]
- Xu, K. Show, attend and tell: Neural image caption generation with visual attention. arXiv 2015, arXiv:1502.03044. [Google Scholar]
- Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
- Bahdanau, D. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Sutskever, I. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
- Cao, C.; Liu, X.; Yang, Y.; Yu, Y.; Wang, J.; Wang, Z.; Huang, Y.; Wang, L.; Huang, C.; Xu, W.; et al. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2956–2964. [Google Scholar]
- Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.; Wierstra, D. Draw: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1462–1471. [Google Scholar]
- Xiao, T.; Xu, Y.; Yang, K.; Zhang, J.; Peng, Y.; Zhang, Z. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 842–850. [Google Scholar]
- Yue-Hei Ng, J.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4694–4702. [Google Scholar]
- Pinheiro, P.; Collobert, R. Recurrent convolutional neural networks for scene labeling. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 82–90. [Google Scholar]
- Hochreiter, S. Long Short-term Memory. Neural Computation; MIT-Press: Cambridge, MA, USA, 1997. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the next location: A recurrent model with spatial and temporal contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. Number 1. [Google Scholar]
- Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.; Chandraker, M. Desire: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 336–345. [Google Scholar]
- Su, H.; Zhu, J.; Dong, Y.; Zhang, B. Forecast the Plausible Paths in Crowd Scenes. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; Volume 1, p. 2. [Google Scholar]
- Hasan, I.; Setti, F.; Tsesmelis, T.; Del Bue, A.; Galasso, F.; Cristani, M. Mx-lstm: Mixing tracklets and vislets to jointly forecast trajectories and head poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6067–6076. [Google Scholar]
- Vemula, A.; Muelling, K.; Oh, J. Social attention: Modeling attention in human crowds. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 4601–4607. [Google Scholar]
- Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft+ hardwired attention: An lstm framework for human trajectory prediction and abnormal event detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef]
- Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1186–1194. [Google Scholar]
- Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12085–12094. [Google Scholar]
- Lisotto, M.; Coscia, P.; Ballan, L. Social and scene-aware trajectory prediction in crowded spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Song, X.; Chen, K.; Li, X.; Sun, J.; Hou, B.; Cui, Y.; Zhang, B.; Xiong, G.; Wang, Z. Pedestrian trajectory prediction based on deep convolutional LSTM network. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3285–3302. [Google Scholar] [CrossRef]
- Salzmann, T.; Ivanovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 683–700. [Google Scholar]
- Xu, Y.; Fu, Y. Adapting to length shift: Flexilength network for trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 15226–15237. [Google Scholar]
- Tang, X.; Kan, M.; Shan, S.; Ji, Z.; Bai, J.; Chen, X. Hpnet: Dynamic trajectory forecasting with historical prediction attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 15261–15270. [Google Scholar]
- Dong, Y.; Wang, L.; Zhou, S.; Tang, W.; Hua, G.; Sun, C. AFC-RNN: Adaptive Forgetting-Controlled Recurrent Neural Network for Pedestrian Trajectory Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 10177–10191. [Google Scholar] [CrossRef]
- Zamboni, S.; Kefato, Z.T.; Girdzijauskas, S.; Norén, C.; Dal Col, L. Pedestrian trajectory prediction with convolutional neural networks. Pattern Recognit. 2022, 121, 108252. [Google Scholar] [CrossRef]
- Wang, D.; Liu, H.; Wang, N.; Wang, Y.; Wang, H.; McLoone, S. SEEM: A sequence entropy energy-based model for pedestrian trajectory all-then-one prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1070–1086. [Google Scholar] [CrossRef]
- Khel, M.H.K.; Greaney, P.; McAfee, M.; Moffett, S.; Meehan, K. Pedestrian Trajectory Prediction using BiLSTM with Spatial-Temporal Attention and Sparse Motion Fields. In Proceedings of the 2023 34th Irish Signals and Systems Conference (ISSC), Dublin, Ireland, 13–14 June 2023; pp. 1–7. [Google Scholar]
- Yang, J.; Chen, Y.; Du, S.; Chen, B.; Principe, J.C. IA-LSTM: Interaction-aware LSTM for pedestrian trajectory prediction. IEEE Trans. Cybern. 2024, 54, 3904–3917. [Google Scholar] [CrossRef]
- Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv 2017, arXiv:1701.04862. [Google Scholar] [CrossRef]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29, 1–10. [Google Scholar]
- Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2255–2264. [Google Scholar]
- Li, J.; Ma, H.; Tomizuka, M. Conditional generative neural system for probabilistic trajectory prediction. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 6150–6156. [Google Scholar]
- Kothari, P.; Alahi, A. Human trajectory prediction using adversarial loss. In Proceedings of the 19th Swiss Transport Research Conference, Ascona, Switzerland, 15–17 May 2019; pp. 15–17. [Google Scholar]
- Kosaraju, V.; Sadeghian, A.; Martín-Martín, R.; Reid, I.; Rezatofighi, H.; Savarese, S. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Adv. Neural Inf. Process. Syst. 2019, 32, 1–10. [Google Scholar]
- Amirian, J.; Hayet, J.B.; Pettré, J. Social ways: Learning multi-modal distributions of pedestrian trajectories with gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 1349–1358. [Google Scholar]
- Sun, H.; Zhao, Z.; He, Z. Reciprocal learning networks for human trajectory prediction. 2020 IEEE. In Proceedings of the CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7416–7425. [Google Scholar]
- Zou, X.; Sun, B.; Zhao, D.; Zhu, Z.; Zhao, J.; He, Y. Multi-modal pedestrian trajectory prediction for edge agents based on spatial-temporal graph. IEEE Access 2020, 8, 83321–83332. [Google Scholar] [CrossRef]
- Zhong, J.; Sun, H.; Cao, W.; He, Z. Pedestrian motion trajectory prediction with stereo-based 3D deep pose estimation and trajectory learning. IEEE Access 2020, 8, 23480–23486. [Google Scholar] [CrossRef]
- Yang, B.; He, C.; Wang, P.; Chan, C.y.; Liu, X.; Chen, Y. TPPO: A novel trajectory predictor with pseudo oracle. arXiv 2020, arXiv:2002.01852. [Google Scholar] [CrossRef]
- Zhou, Z.; Huang, G.; Su, Z.; Li, Y.; Hua, W. Dynamic attention-based CVAE-GAN for pedestrian trajectory prediction. IEEE Robot. Autom. Lett. 2022, 8, 704–711. [Google Scholar] [CrossRef]
- An, H.; Liu, M.; Wang, X.; Zhang, W.; Gong, J. Multi Attention Generative Adversarial Network for Pedestrian Trajectory Prediction Based on Spatial Gridding. Automot. Innov. 2024, 7, 443–455. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, C.; Zhou, J.; Zhou, S. POI-GAN: A pedestrian trajectory prediction method for service scenarios. IEEE Access 2024, 12, 53293–53305. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Yi, S.; Li, H.; Wang, X. Pedestrian behavior understanding and prediction with deep neural networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 263–279. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef]
- Abdullah, M.; He, J.; Wang, K. Weather-aware fiber-wireless traffic prediction using graph convolutional networks. IEEE Access 2022, 10, 95908–95918. [Google Scholar] [CrossRef]
- Zhang, L.; She, Q.; Guo, P. Stochastic trajectory prediction with social graph network. arXiv 2019, arXiv:1907.10233. [Google Scholar] [CrossRef]
- Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 14424–14432. [Google Scholar]
- Li, K.; Eiffert, S.; Shan, M.; Gomez-Donoso, F.; Worrall, S.; Nebot, E. Attentional-GCNN: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14241–14247. [Google Scholar]
- Liu, C.; Chen, Y.; Liu, M.; Shi, B.E. AVGCN: Trajectory prediction using graph convolutional networks guided by human attention. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14234–14240. [Google Scholar]
- Lian, J.; Ren, W.; Li, L.; Zhou, Y.; Zhou, B. Ptp-stgcn: Pedestrian trajectory prediction based on a spatio-temporal graph convolutional neural network. Appl. Intell. 2023, 53, 2862–2878. [Google Scholar] [CrossRef]
- Wang, C.; Cai, S.; Tan, G. Graphtcn: Spatio-temporal interaction modeling for human trajectory prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3450–3459. [Google Scholar]
- Sun, J.; Jiang, Q.; Lu, C. Recursive social behavior graph for trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 660–669. [Google Scholar]
- Rainbow, B.A.; Men, Q.; Shum, H.P. Semantics-STGCNN: A semantics-guided spatial-temporal graph convolutional network for multi-class trajectory prediction. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 2959–2966. [Google Scholar]
- Shi, L.; Wang, L.; Long, C.; Zhou, S.; Zhou, M.; Niu, Z.; Hua, G. SGCN: Sparse graph convolution network for pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 8994–9003. [Google Scholar]
- Zhou, R.; Zhou, H.; Gao, H.; Tomizuka, M.; Li, J.; Xu, Z. Grouptron: Dynamic multi-scale graph convolutional networks for group-aware dense crowd trajectory forecasting. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 805–811. [Google Scholar]
- Su, Y.; Du, J.; Li, Y.; Li, X.; Liang, R.; Hua, Z.; Zhou, J. Trajectory forecasting based on prior-aware directed graph convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16773–16785. [Google Scholar] [CrossRef]
- Bhujel, N.; Yau, W.Y. Disentangling crowd interactions for pedestrians trajectory prediction. IEEE Robot. Autom. Lett. 2023, 8, 3078–3085. [Google Scholar] [CrossRef]
- Mi, J.; Zhang, X.; Zeng, H.; Wang, L. DERGCN: Dynamic-evolving graph convolutional networks for human trajectory prediction. Neurocomputing 2024, 569, 127117. [Google Scholar] [CrossRef]
- Feng, A.; Gong, J.; Wang, N. Pedestrian Trajectory Prediction Algorithm Based on Graph Convolution and Convolution. J. Northeast. Univ. Natural Sci. 2024, 45, 1529. [Google Scholar]
- Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 507–523. [Google Scholar]
- Yin, Z.; Liu, R.; Xiong, Z.; Yuan, Z. Multimodal Transformer Networks for Pedestrian Trajectory Prediction. In Proceedings of the IJCAI, Montreal, QC, Canada, 19–27 August 2021; pp. 1259–1265. [Google Scholar]
- Yuan, Y.; Weng, X.; Ou, Y.; Kitani, K.M. Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9813–9823. [Google Scholar]
- Li, L.; Pagnucco, M.; Song, Y. Graph-based spatial transformer with memory replay for multi-future pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 2231–2241. [Google Scholar]
- Gu, T.; Chen, G.; Li, J.; Lin, C.; Rao, Y.; Zhou, J.; Lu, J. Stochastic trajectory prediction via motion indeterminacy diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17113–17122. [Google Scholar]
- Liu, Y.; Yao, L.; Li, B.; Wang, X.; Sammut, C. Social graph transformer networks for pedestrian trajectory prediction in complex social scenarios. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Birmingham, UK, 21–25 October 2022; pp. 1339–1349. [Google Scholar]
- Shi, L.; Wang, L.; Zhou, S.; Hua, G. Trajectory unified transformer for pedestrian trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 9675–9684. [Google Scholar]
- Tamaru, R.; Li, P.; Ran, B. Enhancing Pedestrian Trajectory Prediction with Crowd Trip Information. arXiv 2024, arXiv:2409.15224. [Google Scholar] [CrossRef]
- Lin, X.; Liang, T.; Lai, J.; Hu, J.F. Progressive pretext task learning for human trajectory prediction. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 197–214. [Google Scholar]
- Wang, C.; Wang, J.; Gao, W.; Guo, L. SIAT: Pedestrian trajectory prediction via social interaction-aware transforme. Complex Intell. Syst. 2025, 11, 335. [Google Scholar] [CrossRef]
- Teeti, I.; Thomas, A.; Monga, M.; Kumar, S.; Singh, U.; Bradley, A.; Banerjee, B.; Cuzzolin, F. ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction. arXiv 2025, arXiv:2501.09878. [Google Scholar]
- Liang, R.; Li, Y.; Zhou, J.; Li, X. STGlow: A flow-based generative framework with dual-graphormer for pedestrian trajectory prediction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 16504–16517. [Google Scholar] [CrossRef] [PubMed]
- Mangalam, K.; Girase, H.; Agarwal, S.; Lee, K.H.; Adeli, E.; Malik, J.; Gaidon, A. It is not the journey but the destination: Endpoint conditioned trajectory prediction. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 759–776. [Google Scholar]
- Mangalam, K.; An, Y.; Girase, H.; Malik, J. From goals, waypoints & paths to long term human trajectory forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15233–15242. [Google Scholar]
- Gu, J.; Sun, C.; Zhao, H. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15303–15312. [Google Scholar]
- Guo, K.; Liu, W.; Pan, J. End-to-end trajectory distribution prediction based on occupancy grid maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2242–2251. [Google Scholar]
- Korbmacher, R.; Tordeux, A. Review of pedestrian trajectory prediction methods: Comparing deep learning and knowledge-based approaches. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24126–24144. [Google Scholar] [CrossRef]
- Pellegrini, S.; Ess, A.; Schindler, K.; Van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar]
- Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2007; Number 3; pp. 655–664. [Google Scholar]
- Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 549–565. [Google Scholar]
- Lian, J.; Wang, X. Pedestrian trajectory prediction based on human-vehicle interaction. China J. Highw. Transp. 2021, 34, 215–223. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11621–11631. [Google Scholar]
- Chang, M.F.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8748–8757. [Google Scholar]
- Rasouli, A.; Kotseruba, I.; Kunic, T.; Tsotsos, J.K. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6262–6271. [Google Scholar]
- Kothari, P.; Kreiss, S.; Alahi, A. Human trajectory forecasting in crowds: A deep learning perspective. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7386–7400. [Google Scholar] [CrossRef]
- Deo, N.; Trivedi, M.M. Multi-modal trajectory prediction of surrounding vehicles with maneuver based lstms. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1179–1184. [Google Scholar]
- Koutrintzes, D.; Spyrou, E.; Mathe, E.; Mylonas, P. A multimodal fusion approach for human activity recognition. Int. J. Neural Syst. 2023, 33, 2350002. [Google Scholar] [CrossRef]
- Obaid, L.; Hamad, K.; Al-Ruzouq, R.; Dabous, S.A.; Ismail, K. Automating the estimation of turning movement rates at multilane roundabouts using unmanned aerial vehicles and deep learning. Green Energy Intell. Transp. 2025, 100340. [Google Scholar] [CrossRef]
- Li, Z.; Gong, C.; Lin, Y.; Li, G.; Wang, X.; Lu, C.; Wang, M.; Chen, S.; Gong, J. Continual driver behaviour learning for connected vehicles and intelligent transportation systems: Framework, survey and challenges. Green Energy Intell. Transp. 2023, 2, 100103. [Google Scholar] [CrossRef]







| Method | Ref. | Contribution | Strengths | Limitations |
|---|---|---|---|---|
| S-LSTM | [14] | Introduced social pooling to share hidden states among pedestrians. | Social pooling | Local interaction, ignoring contextual information. |
| ST-RNN | [54] | Used a transfer matrix to capture local spatio-temporal contexts. | Models time-specific and location-specific transitions. | Predefined tensor structures lack flexibility. |
| DESIRE | [55] | An encoder–decoder framework with a VAE for diverse trajectory generation. | Inverse optimal control and long-term prediction. | Spatial constraints. |
| S-aware LSTM | [56] | Embedded prior knowledge via a recursive Gaussian process for crowd dynamics. | Suitable for unstructured scenarios. | Weak generalization. |
| MX-LSTM | [57] | Integrated head orientation and location information into LSTM. | Long-term prediction, joint prediction. | Limited application scenarios. |
| S-Attention | [58] | Used a spatio-temporal graph with soft attention to model pedestrian influences. | Social interaction, attention mechanism. | Maintained the entire image. |
| S-Scene-LSTM | [60] | A hierarchical LSTM combining individual, social, and scene layout information. | Scene layout. | Short trajectory prediction is not accurate. |
| SR-LSTM | [61] | Refined pedestrian states via a message-passing mechanism and attention. | Joint prediction. | No consideration of static scene context information. |
| SNS-LSTM | [62] | Extended S-LSTM with navigation and language pooling layers. | Multi-mode input. | Complex interaction. |
| DCLN | [63] | A deep convolutional LSTM network for learning spatio-temporal interactions. | Wide coverage. | The model is complex. |
| Trajectron++ | [64] | A modular, graph-based model integrating agent dynamics and heterogeneous data. | Dynamic constraints, scene information, probability distribution. | Weak generalization. |
| 2D CNN | [68] | Applied a 2D CNN architecture for trajectory prediction with data augmentation. | Simple yet effective; benefits from standard CNN augmentation techniques. | Lacks inherent sequential modeling capability. |
| SEEM | [69] | A generator-energy network based on sequence entropy energy. | Probability distribution. | Insufficient consideration of scene semantic information. |
| SS-BLSTAS | [70] | A spatio-temporal model considering obstacles and social interactions. | CNN combined with attention, comprehensive interaction. | Poor generalization ability. |
| IA-LSTM | [71] | Combined spatial and interaction features using a Correntropy-based mechanism. | Enables information sharing among pedestrians and captures complex interactions in crowds. | Mediocre performance in diverse scenarios and emergency situations. |
| FlexiLength-LSTM | [65] | Length-adaptive LSTM with curriculum gate to counter distribution shift under variable observation lengths. | Single model for any length; tiny param. count; real-time; clear ADE/FDE boost. | Uni-directional; long-horizon drift; no explicit social pooling. |
| HPNet | [66] | Bi-LSTM whose hidden states are refined by its own past predictions via historical-prediction attention and confidence gate. | Pure RNN; better long-term FDE; online capable; plug-and-play. | Extra cache/compute; rudimentary social modeling. |
| AFC-RNN | [67] | Adaptive forgetting-controlled RNN that uses self-attention memory factors to tune the forget gate explicitly. | Compact; keeps recurrent pipeline. | Still uni-directional; limited scene context. |
| Method | Ref. | Contribution | Strengths | Limitations |
|---|---|---|---|---|
| Social-GAN | [74] | Combines LSTM with GANs for realistic trajectory generation, but suffers from unstable training and limited generalization. | Multi-modal output | Limited interaction modeling, unstable training. |
| CGNS | [75] | Enhances Social-GAN by adding static context and attention mechanism, improving feature extraction. | Probability distribution, joint optimization | Complex structure. |
| FSGAN | [76] | Improved GANs with physical and social attention, outperforming Social-GANs in some datasets. | Diverse trajectory generation | Weak social interaction modeling. |
| S-BiGAT | [77] | Uses graphical attention (GAT) with a generator and LSTM decoder to produce multi-modal predictions. | Multi-modal output | Strong feature dependency. |
| Social-Ways | [78] | Replaces L2 loss with information loss, reducing mode collapse and improving training stability. | Improved generalization | Limited applicability. |
| Sophie | [79] | Uses attention mechanisms and GANs with LSTM to provide more interpretable and accurate trajectory predictions. | Incorporates scene semantics | Requires manual feature sorting. |
| Forward-Backward | [80] | Uses reciprocal learning and adversarial networks to refine predictions. | Reciprocal constraints | Slow fitting. |
| STG-GAN | [81] | A scalable spatio-temporal GAN, using attention for global interactions. | Scalable interaction modeling | Misconnection between images. |
| Depth-PoseGAN | [82] | A 3D pose and trajectory prediction method based on GANs for motion prediction in 3D space | Effective in 3D prediction | High computational complexity. |
| TPPO | [83] | Uses GANs with social attention and a min–max game for trajectory refinement. | Latent variable exploration | Lacks robustness in interaction modeling. |
| DACG | [84] | A CVAE-GAN model with dynamic attention to model interactions and intentions. | Dynamic encoding, multi-modal output | Complex model structure. |
| SGMA-GAN | [85] | Uses GANs with temporal and spatial attention for dynamic interaction modeling | State stability, dynamic interaction | High computational complexity. |
| POI-GAN | [86] | Info-GAN with POI social force layer and field-of-view mask to model service-scene attraction. | Handles POI attraction; FOV filters unrealistic links; faster convergence. | Extra POI labels needed; still frame-based social pool. |
| Method | Ref. | Contribution | Strengths | Limitations |
|---|---|---|---|---|
| Behavior-CNN | [88] | First CNN model applied to pedestrian trajectory prediction, establishing the viability of convolutional approaches for this task. | Pioneering CNN application | Local receptive fields; no long-term dependencies |
| Social Graph Network | [91] | Encoder–decoder framework with dynamic adjacency matrix that learns evolving social interactions through graph structures. | Dynamic social modeling | Computationally intensive; data-dependent |
| Social-STGCNN | [92] | Integrates graph convolutional networks with temporal convolutional networks to jointly model social relationships and spatio-temporal dependencies. | Compact architecture; fast inference | Computationally demanding; low interpretability |
| Attention-GCNN | [93] | Enhances Social-STGCNN with attention mechanisms to dynamically weight the importance of different pedestrian interactions. | Selective attention; improved accuracy | Black-box nature |
| AVGCN | [94] | Combines attention mechanisms with variational trajectory prediction, incorporating visual field constraints for more realistic interaction modeling. | Field-of-view constraints; uncertainty handling | Poor crowd performance; data-sensitive |
| PTP-STGCN | [95] | Employs spatio-temporal graph convolutional networks with T-transformer model to capture both spatial interactions and temporal dependencies. | Flexible architecture | High complexity |
| GraphTCN | [96] | Unified time-space graph model utilizing CNNs with multi-head attention to capture interactions and generate diverse potential trajectories. | Efficient reasoning; adaptive gating | Limited spatial modeling |
| RSBG | [97] | Recursively extracts social relationships to construct dynamic social behavior graphs that evolve over time. | Social norm modeling | Parameter-sensitive |
| Semantics-STGCNN | [98] | Incorporates semantic category information into graph convolutional networks to enable trajectory prediction for diverse road users. | Multi-class prediction | High computation; limited cross-category interactions |
| SGCN | [99] | Focuses on sparse-directed interactions and motion trend modeling to reduce redundant computations in crowded scenes. | Sparse interaction efficiency | Subjective interactions |
| Grouptron | [100] | Multi-scale hierarchical framework incorporating scene-level, group-level, and individual-level graphs for comprehensive pedestrian dynamics modeling. | Multi-scale representation | Low interpretability |
| Directed Graph CNN | [101] | Models asymmetric social interactions using directed graph topologies based on visual perspective, movement direction, and velocity relationships. | Asymmetric social modeling | Limited scenario validation |
| DGCN | [102] | Decouples spatial and temporal factors into separate latent spaces with specialized regularization for improved crowd interaction modeling. | Disentangled learning | Data- and computation-intensive |
| DERGCN | [103] | Dynamic-evolving GCN: learnable edge gates add/remove neighbors online, keeping only temporally relevant connections. | Pure GCN; no hand-crafted rules. | Edge-gate training needs careful regularization; density hyper-param-sensitive. |
| CNN-GCN Fusion | [104] | Lightweight multi-scale CNN and single GCN layer fuses trajectory nodes with scene grid-cells; one-shot forward. | Params decling | Scene-grid resolution sensitive; single GCN may under-fit dense crowds. |
| Method | Ref. | Contribution | Strengths | Limitations |
|---|---|---|---|---|
| STAR | [105] | Alternates between spatial and temporal Transformer to capture both spatial and temporal information, using graph convolution to model social interactions. | Space–time interaction, self-attention | High information dependency, complex computation. |
| MTN | [106] | Combines trajectory and optical flow data, using attention mechanisms and Transformer modules for feature extraction and fusion for accurate predictions. | Multi-mode input, fine-grained representation | Strong data dependency, complex training. |
| Agent- Former | [107] | Simultaneously learns time and social dimensions, using an agent-aware attention mechanism and a multi-agent framework to model social interactions and improve predictions. | Models temporal and social aspects, preserves identity info | High data dependency, high computation. |
| GS Transformer | [108] | Uses a spatial Transformer to capture pedestrian interactions and a memory playback algorithm to ensure temporal consistency in trajectory predictions. | Strong spatial interaction, smooth trajectories | High computational complexity. |
| MID | [109] | Predicts random trajectories using a Transformer architecture, modeling motion uncertainty as an inverse diffusion process within a Markov chain. | Combines diffusion and reverse processes | Low feasibility. |
| SGTN | [110] | Integrates graph convolutional networks and Transformer networks, directly capturing spatio-temporal features through adjacency matrices for complex social environment predictions. | Handles complex interactions, long-term dependencies | Complex calculations, difficult parameter tuning. |
| TUTR | [111] | Provides global predictions and pattern-level parsing using a dual prediction head and social-level Transformer decoder, eliminating post-processing steps like clustering. | Fast, efficient prediction | Needs better adaptability to complex data. |
| RN- Transformer | [112] | Uses crowd travel information for global social interaction modeling; divides the area into nodes and grids for local and global trajectory prediction, integrating a road network model. | Captures global social interactions, explores road networks | Complex optimization, many hyperparameters. |
| PPT-Former | [113] | Non-autoregressive Transformer that uses learnable destination prompts and progressive pretext tasks to generate full trajectories in one shot, with cross-task distillation for smoother long-term motion. | Single-shot generation; cross-task distillation; smoother paths; noticeable error drop. | Two-stage training overhead; prompt length sensitive; no explicit scene context. |
| SIAT | [114] | Social Interaction-Aware Transformer that learns time-varying social graph edge weights from relative motion and fuses parallel GCN spatial interactions with Transformer temporal dynamics inside a single decoder. | Captures both temporal and social without hand-crafted rules; clear accuracy gain in crowds | Graph density hyper-param; GCNs branch adds memory; scene raster unused. |
| ASTRA | [115] | Lightweight scene-aware Transformer that uses a U-Net backbone to extract bird’s-eye scene context once in a graph-aware encoder, produces deterministic or stochastic trajectories via a CVAE decoder, and penalises non-walkable paths with a weighted scene-compliance loss. | Params decling; stochastic outputs | U-Net scene needs calibration; CVAE latent variance tricky; penalty weight tuned per dataset. |
| Method | References | Description, Advantages, and Shortcomings |
|---|---|---|
| Noise-based prediction | [55,74,76,78,107,116] | Adds random noise to deterministic predictions, generating multiple outcomes. Simple but lacks accuracy and increases complexity. |
| Anchor-based prediction | [117,118,119] | Uses anchor points for trajectory prediction, improving robustness but difficult anchor selection. |
| Grid-based prediction | [108,120] | Represents locations with grid maps for predictions. Flexible but computationally expensive and sensitive to grid resolution. |
| Bivariate Gaussian output | [92] | Assumes bivariate Gaussian distribution for positions, allowing multi-modal predictions but unrealistic and computationally costly. |
| Dataset | View | Scene | Frame | Number of People | Size of Group | Number of Obstacles |
|---|---|---|---|---|---|---|
| ETH | Bird View | ETH | 1448 | 360 | 243 | 44 |
| Hotel | 1168 | 390 | 326 | 25 | ||
| UCY | Bird View | Univ | 541 | 434 | 294 | 16 |
| Zara1 | 866 | 148 | 91 | 34 | ||
| Zara2 | 1052 | 204 | 140 | 34 |
| Dataset | Year | Agents | Scene | Durations | Data Format |
|---|---|---|---|---|---|
| UCY | 2007 | Pedestrian | Campus, Urban | 29.5 min | Frame sequences with pedestrian coordinates |
| ETH | 2009 | Pedestrian | Campus, Sidewalks, Hotel | 25 min | Timestamps and 2D coordinates on a plane |
| Stanford Drone | 2016 | Pedestrian, Bicyclists, Cars | Campus | 5 h | Pixel coordinates for each agent’s trajectory |
| JAAD | 2017 | Pedestrian | Crossing streets | 240 h | Video frames with pedestrian behavior labels |
| Nuscenes | 2019 | Vehicles, Pedestrians, Cyclists | Urban streets, intersections | 1000 scenes | Sensor data including RADAR and LIDAR |
| Argoverse | 2019 | Vehicles, Pedestrians, Cyclists | City streets, intersections | 320 h | 2D and 3D tracking data with detailed map info |
| PIE | 2019 | Pedestrian | Crosswalks, sidewalks | 6 h | Trajectories, gaze data, behavior annotations |
| Class | Evaluation Metrics | Formula and Description |
|---|---|---|
| Essential Evaluation Metrics | Accuracy | , where = Number of correct predictions, = Total number of predictions. |
| Precision | , where = True Positives, = False Positives. | |
| Recall | , where = True Positives, = False Negatives | |
| F1 Score | , the harmonic mean of Precision and Recall, providing a balance between the two metrics. | |
| Mean Absolute Error (MAE) | , where is the predicted value for the i-th data point, is the actual ground-truth value for the i-th data point. | |
| Root Mean Squared Error (RMSE) | . | |
| Advanced Trajectory Metrics | Average Displacement Error (ADE) | , where is the Euclidean distance between , the actual position of pedestrian i at time t, and , the predicted position. |
| Final Displacement Error (FDE) | . | |
| Minimum Average Displacement Error (minADE) | , where represents the actual position of pedestrian i at time t, and denotes the j-th predicted position of pedestrian i at time t, where j indexes different trajectory predictions for pedestrian. i. | |
| Minimum Final Displacement Error (minFDE) | , dividing by N normalizes the sum, giving the average Minimum Final Displacement Error across all individuals. | |
| Probabilistic Distribution Metrics | Negative Log-Likelihood (NLL) | , where NLL stands for negative log-likelihood, measuring the likelihood of observing the given outcomes under the model’s predicted probability distribution. denotes a binary indicator of the correctness of predicting the data sample in class i, is the predicted probability of the data sample belonging to class i, and n is the total number of classes. |
| Kernel Density Estimation NLL (KDE-NLL) | , where the kernel density estimation negative log-likelihood evaluates the match between predicted and observed probability distributions. is the predicted probability distribution at the i-th data point, and is the true probability distribution at the i-th data point. |
| Method | ETH (ADE/FDE) | HOTEL (ADE/FDE) | UNIV (ADE/FDE) | ZARA1 (ADE/FDE) | ZARA2 (ADE/FDE) | AVG (ADE/FDE) |
|---|---|---|---|---|---|---|
| Social-LSTM | 0.46/0.95 | 0.15/0.29 | 0.22/0.45 | 0.29/0.48 | 0.22/0.67 | 0.27/0.57 |
| SR-LSTM | 0.63/1.25 | 0.37/0.74 | 0.51/1.10 | 0.41/0.90 | 0.32/0.70 | 0.45/0.94 |
| Trajectron++ | 0.71/1.68 | 0.22/0.46 | 0.41/1.07 | 0.30/0.77 | 0.23/0.59 | 0.37/0.91 |
| Social-GAN | 0.81/1.52 | 0.72/1.61 | 0.60/1.26 | 0.34/0.69 | 0.42/0.84 | 0.58/1.18 |
| CGNS | 0.62/1.40 | 0.70/0.93 | 0.48/1.22 | 0.32/0.59 | 0.35/0.71 | 0.49/0.97 |
| Social-BiGAT | 0.69/1.29 | 0.49/1.01 | 0.55/1.32 | 0.30/0.62 | 0.36/0.75 | 0.48/1.00 |
| Sophie | 0.70/1.43 | 0.76/1.67 | 0.54/1.24 | 0.30/0.63 | 0.38/0.78 | 0.54/1.15 |
| STGAN | 0.65/1.12 | 0.35/0.66 | 0.52/1.10 | 0.34/0.69 | 0.29/0.60 | 0.43/0.83 |
| Social-STGCNN | 0.64/1.11 | 0.49/0.85 | 0.44/0.79 | 0.34/0.53 | 0.30/0.48 | 0.44/0.75 |
| Attention-GCNN | 0.68/1.22 | 0.31/0.41 | 0.39/0.69 | 0.34/0.55 | 0.28/0.44 | 0.40/0.66 |
| AVGCN | 0.62/1.06 | 0.31/0.58 | 0.55/1.20 | 0.33/0.70 | 0.27/0.58 | 0.41/0.82 |
| PTP-STGCN | 0.63/1.04 | 0.34/0.45 | 0.48/0.87 | 0.37/0.61 | 0.30/0.46 | 0.42/0.68 |
| RSBG | 0.80/1.53 | 0.33/0.64 | 0.59/1.25 | 0.40/0.86 | 0.30/0.65 | 0.48/0.99 |
| GraphTCN | 0.39/0.71 | 0.21/0.44 | 0.33/0.66 | 0.21/0.42 | 0.17/0.43 | 0.26/0.51 |
| SGCN | 0.63/1.03 | 0.32/0.55 | 0.37/0.70 | 0.29/0.53 | 0.25/0.45 | 0.37/0.65 |
| Grouptron | 0.70/1.56 | 0.21/0.46 | 0.38/0.97 | 0.30/0.76 | 0.22/0.56 | 0.36/0.86 |
| PECNet | 0.54/0.87 | 0.18/0.24 | 0.35/0.60 | 0.22/0.39 | 0.17/0.30 | 0.29/0.48 |
| VDRGCN | 0.62/0.81 | 0.27/0.37 | 0.38/0.58 | 0.29/0.42 | 0.21/0.32 | 0.35/0.50 |
| DGCN+STDec | 0.28/0.62 | 0.15/0.31 | 0.38/0.88 | 0.27/0.60 | 0.24/0.57 | 0.26/0.59 |
| STAR | 0.56/1.11 | 0.26/0.50 | 0.52/1.15 | 0.41/0.90 | 0.31/0.71 | 0.41/0.87 |
| AgentFormer | 0.26/0.39 | 0.11/0.14 | 0.26/0.46 | 0.15/0.23 | 0.14/0.24 | 0.18/0.29 |
| MID | 0.39/0.66 | 0.13/0.22 | 0.22/0.45 | 0.17/0.30 | 0.13/0.27 | 0.21/0.38 |
| SGTN | 0.40/0.59 | 0.14/0.16 | 0.16/0.58 | 0.17/0.29 | 0.14/0.17 | 0.20/0.28 |
| AFC-RNN | 0.37/0.56 | 0.12/0.19 | 0.20/0.36 | 0.16/0.30 | 0.12/0.21 | 0.19/0.32 |
| DERGCN | 0.48/0.73 | 0.25/0.45 | 0.49/1.05 | 0.30/0.60 | 0.28/0.59 | 0.36/0.68 |
| SIAT | 0.43/0.61 | 0.12/0.17 | 0.24/0.43 | 0.18/0.33 | 0.14/0.25 | 0.22/0.36 |
| ASTRA | 0.47/0.82 | 0.29/0.56 | 0.55/1.00 | 0.34/0.71 | 0.24/0.41 | 0.38/0.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, X.; Li, C.; Gao, L.; Niu, X. A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology. Sensors 2025, 25, 7360. https://doi.org/10.3390/s25237360
Gu X, Li C, Gao L, Niu X. A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology. Sensors. 2025; 25(23):7360. https://doi.org/10.3390/s25237360
Chicago/Turabian StyleGu, Xiang, Chao Li, Long Gao, and Xuefen Niu. 2025. "A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology" Sensors 25, no. 23: 7360. https://doi.org/10.3390/s25237360
APA StyleGu, X., Li, C., Gao, L., & Niu, X. (2025). A Review of Pedestrian Trajectory Prediction Methods Based on Deep Learning Technology. Sensors, 25(23), 7360. https://doi.org/10.3390/s25237360

