Predicting Pedestrian Trajectories with Deep Adversarial Networks Considering Motion and Spatial Information
Abstract
:1. Introduction
- A comprehensive motion feature extraction method, which includes the relative distance, speed, and velocity angle, is used as the query for an attention mechanism-based social interaction module.
- A generative network, which is aware of social and spatial interactions, generates more socially and physically feasible trajectories.
- The novel idea of using the CARLA simulator and manual annotations to create multi-trajectory datasets can enable the evaluation of metrics across multiple predicted trajectories.
2. Related Works
2.1. Social Interactions
2.2. Spatial Interactions
2.3. Multimodality
3. Problem Definition and Methods
3.1. Problem Definition
3.2. The Proposed SSA-GAN Model
3.3. Modeling Pedestrian Intent with LSTM
3.4. Social Attention with Motion Features as the Query
3.5. Spatial Attention with Semantic Spatial Features
3.6. GAN
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Baselines
4.1.3. Evaluation Metrics
- The Average Displacement Error (ADE): The mean Euclidean distance between the predicted and ground-truth trajectories across all time steps.
- The Final Displacement Error (FDE): The distance between the last predicted point and the corresponding actual point at the final time step.
- The Minimum Average Displacement Error (minADE): The minimum ADE of the predicted trajectories between all possible real trajectories.
- The Minimum Final Displacement Error (minFDE): The minimum FDE of the predicted endpoint over all possible real endpoints.
4.1.4. Implementation Details
4.2. Quantitative Results
4.2.1. Comparison with Baselines
4.2.2. Ablation Study
4.2.3. Evaluation on Multi-Trajectory Datasets
4.3. Qualitative Results
4.3.1. Social Attention with Motion Features
4.3.2. Multi-Trajectory Prediction Performance
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yang, C.T.; Zhang, T.; Chen, L.P.; Fu, L.C. Socially-Aware Navigation of Omnidirectional Mobile Robot with Extended Social Force Model in Multi-Human Environment. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 1963–1968. [Google Scholar] [CrossRef]
- Haarslev, F.; Juel, W.K.; Kollakidou, A.; Krüger, N.; Bodenhagen, L. Context-aware Social Robot Navigation. In Proceedings of the 18th International Conference on Informatics in Control, Automation and Robotics, Paris, France, 6–8 July 2021; pp. 426–433. [Google Scholar] [CrossRef]
- Li, K.; Shan, M.; Narula, K.; Worrall, S.; Nebot, E. Socially Aware Crowd Navigation with Multimodal Pedestrian Trajectory Prediction for Autonomous Vehicles. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems, Rhodes, Greece, 20–23 September 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Uhlemann, N.; Fent, F.; Lienkamp, M. Evaluating Pedestrian Trajectory Prediction Methods for the Application in Autonomous Driving. arXiv 2023, arXiv:2308.05194. [Google Scholar]
- Rhinehart, N.; Mcallister, R.; Kitani, K.; Levine, S. PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2821–2830. [Google Scholar] [CrossRef]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar] [CrossRef]
- Rudenko, A.; Palmieri, L.; Herman, M.; Kitani, K.M.; Gavrila, D.M.; Arras, K.O. Human motion trajectory prediction: A survey. Int. J. Robot. Res. 2020, 39, 895–935. [Google Scholar] [CrossRef]
- Helbing, D.; Molnár, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282–4286. [Google Scholar] [CrossRef] [PubMed]
- Korbmacher, R.; Tordeux, A. Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based Approaches. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24126–24144. [Google Scholar] [CrossRef]
- Golchoubian, M.; Ghafurian, M.; Dautenhahn, K.; Azad, N.L. Pedestrian Trajectory Prediction in Pedestrian-Vehicle Mixed Environments: A Systematic Review. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11544–11567. [Google Scholar] [CrossRef]
- Haddad, S.; Wu, M.; Wei, H. Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model. arXiv 2019, arXiv:1902.05437. [Google Scholar]
- Xue, H.; Huynh, D.; Reynolds, M. Location-Velocity Attention for Pedestrian Trajectory Prediction. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 2038–2047. [Google Scholar] [CrossRef]
- Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12077–12086. [Google Scholar] [CrossRef]
- Varshneya, D.; Srinivasaraghavan, G. Human Trajectory Prediction using Spatially aware Deep Attention Models. arXiv 2017, arXiv:1705.09436. [Google Scholar]
- Song, X.; Chen, K.; Li, X.; Sun, J.; Hou, B.; Cui, Y.; Zhang, B.; Xiong, G.; Wang, Z. Pedestrian Trajectory Prediction Based on Deep Convolutional LSTM Network. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3285–3302. [Google Scholar] [CrossRef]
- Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1186–1194. [Google Scholar] [CrossRef]
- Bartoli, F.; Lisanti, G.; Ballan, L.; Del Bimbo, A. Context-Aware Trajectory Prediction. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 1941–1946. [Google Scholar] [CrossRef]
- Lisotto, M.; Coscia, P.; Ballan, L. Social and Scene-Aware Trajectory Prediction in Crowded Spaces. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 2567–2574. [Google Scholar] [CrossRef]
- Zhang, P.; Xue, J.; Zhang, P.; Zheng, N.; Ouyang, W. Social-aware Pedestrian Trajectory Prediction via States Refinement LSTM. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2742–2759. [Google Scholar] [CrossRef] [PubMed]
- Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.S.; Chandraker, M. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2165–2174. [Google Scholar] [CrossRef]
- Huang, L.; Zhuang, J.; Cheng, X.; Xu, R.; Ma, H. STI-GAN: Multimodal Pedestrian Trajectory Prediction Using Spatiotemporal Interactions and a Generative Adversarial Network. IEEE Access 2021, 9, 50846–50856. [Google Scholar] [CrossRef]
- Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1349–1358. [Google Scholar] [CrossRef]
- Kosaraju, V.; Sadeghian, A.; Roberto, M.M.; Reid, I.; Rezatofighi, S.H.; Savarese, S. Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 137–146. [Google Scholar]
- Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2255–2264. [Google Scholar] [CrossRef]
- Vemula, A.; Muelling, K.; Oh, J. Social Attention: Modeling Attention in Human Crowds. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Kothari, P.; Kreiss, S.; Alahi, A. Human Trajectory Forecasting in Crowds: A Deep Learning Perspective. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7386–7400. [Google Scholar] [CrossRef]
- Ma, Y.; Lee, E.W.M.; Yuen, R.K.K. An Artificial Intelligence-Based Approach for Simulating Pedestrian Movement. IEEE Trans. Intell. Transp. Systs. 2016, 17, 3159–3170. [Google Scholar] [CrossRef]
- Shi, X.; Shao, X.; Guo, Z.; Wu, G.; Zhang, H.; Shibasaki, R. Pedestrian Trajectory Prediction in Extremely Crowded Scenarios. Sensors 2019, 19, 1223. [Google Scholar] [CrossRef] [PubMed]
- Tordeux, A.; Chraibi, M.; Seyfried, A.; Schadschneider, A. Prediction of pedestrian dynamics in complex architectures with artificial neural networks. J. Intell. Transp. Syst. 2019, 24, 556–568. [Google Scholar] [CrossRef]
- Li, J.; Ma, H.; Zhang, Z.; Tomizuka, M. Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network. arXiv 2020, arXiv:2002.06241. [Google Scholar]
- Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft + Hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef] [PubMed]
- Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 507–523. [Google Scholar] [CrossRef]
- Zamboni, S.; Kefato, Z.T.; Girdzijauskas, S.; Norén, C.; Dal Col, L. Pedestrian trajectory prediction with convolutional neural networks. Pattern Recognit. 2022, 121, 108252. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Liu, D.; Li, Q.; Li, S.; Kong, J.; Qi, M. Non-Autoregressive Sparse Transformer Networks for Pedestrian Trajectory Prediction. Appl. Sci. 2023, 13, 3296. [Google Scholar] [CrossRef]
- Amirian, J.; Hayet, J.B.; Pettre, J. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories With GANs. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 2964–2972. [Google Scholar] [CrossRef]
- Duan, J.; Wang, L.; Long, C.; Zhou, S.; Zheng, F.; Shi, L.; Hua, G. Complementary Attention Gated Network for Pedestrian Trajectory Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; pp. 542–550. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, L.; Zhou, S.; Duan, J.; Hua, G.; Tang, W. Multi-Stream Representation Learning for Pedestrian Trajectory Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2023; pp. 2875–2882. [Google Scholar] [CrossRef]
- Zheng, F.; Wang, L.; Zhou, S.; Tang, W.; Niu, Z.; Zheng, N.; Hua, G. Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13148–13157. [Google Scholar] [CrossRef]
- Shi, L.; Wang, L.; Long, C.; Zhou, S.; Tang, W.; Zheng, N.; Hua, G. Representing Multimodal Behaviors With Mean Location for Pedestrian Trajectory Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11184–11202. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward Multimodal Image-to-Image Translation. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 465–476. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2180–2188. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Pellegrini, S.; Ess, A.; Schindler, K.; van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar] [CrossRef]
- Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by Example. Comput. Graph. Forum 2007, 26, 655–664. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Metric | Dataset | Linear | S-LSTM | S-GAN | SoPhie | SSA-GAN (Ours) |
---|---|---|---|---|---|---|
ADE | ETH-Eth | 1.33 | 1.09 | 0.81 | 0.70 | 0.68 |
ETH-Hotel | 0.39 | 0.79 | 0.72 | 0.76 | 0.69 | |
UCY-Univ | 0.82 | 0.67 | 0.60 | 0.54 | 0.55 | |
UCY-Zara1 | 0.62 | 0.47 | 0.34 | 0.30 | 0.30 | |
UCY-Zara2 | 0.77 | 0.56 | 0.42 | 0.38 | 0.36 | |
AVG | 0.79 | 0.72 | 0.58 | 0.54 | 0.52 | |
FDE | ETH-Eth | 2.94 | 2.35 | 1.52 | 1.43 | 1.44 |
ETH-Hotel | 0.72 | 1.76 | 1.61 | 1.67 | 1.55 | |
UCY-Univ | 1.59 | 1.40 | 1.26 | 1.24 | 1.22 | |
UCY-Zara1 | 1.21 | 1.00 | 0.69 | 0.63 | 0.63 | |
UCY-Zara2 | 1.48 | 1.17 | 0.84 | 0.78 | 0.75 | |
AVG | 1.59 | 1.54 | 1.18 | 1.15 | 1.12 |
Metric | Dataset | SSA-GAN without Comprehensive Motion Features | SSA-GAN without Social Attention | Complete SSA-GAN |
---|---|---|---|---|
ADE | ETH-Eth | 0.79 | 0.76 | 0.68 |
ETH-Hotel | 0.66 | 0.68 | 0.69 | |
UCY-Univ | 0.58 | 0.63 | 0.55 | |
UCY-Zara1 | 0.31 | 0.30 | 0.30 | |
UCY-Zara2 | 0.38 | 0.39 | 0.36 | |
AVG | 0.54 | 0.55 | 0.52 | |
FDE | ETH-Eth | 1.50 | 1.48 | 1.44 |
ETH-Hotel | 1.60 | 1.58 | 1.55 | |
UCY-Univ | 1.23 | 1.25 | 1.22 | |
UCY-Zara1 | 0.64 | 0.65 | 0.63 | |
UCY-Zara2 | 0.80 | 0.82 | 0.75 | |
AVG | 1.15 | 1.15 | 1.12 |
Metric | Dataset | S-GAN | SSA-GAN |
---|---|---|---|
minADE | SMTD-Crossroad | 0.68 | 0.63 |
SMTD-Plaza | 0.75 | 0.69 | |
minFDE | SMTD-Crossroad | 0.75 | 0.72 |
SMTD-Plaza | 0.81 | 0.78 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lao, L.; Du, D.; Chen, P. Predicting Pedestrian Trajectories with Deep Adversarial Networks Considering Motion and Spatial Information. Algorithms 2023, 16, 566. https://doi.org/10.3390/a16120566
Lao L, Du D, Chen P. Predicting Pedestrian Trajectories with Deep Adversarial Networks Considering Motion and Spatial Information. Algorithms. 2023; 16(12):566. https://doi.org/10.3390/a16120566
Chicago/Turabian StyleLao, Liming, Dangkui Du, and Pengzhan Chen. 2023. "Predicting Pedestrian Trajectories with Deep Adversarial Networks Considering Motion and Spatial Information" Algorithms 16, no. 12: 566. https://doi.org/10.3390/a16120566
APA StyleLao, L., Du, D., & Chen, P. (2023). Predicting Pedestrian Trajectories with Deep Adversarial Networks Considering Motion and Spatial Information. Algorithms, 16(12), 566. https://doi.org/10.3390/a16120566