Spatial-Temporal Attentive LSTM for Vehicle-Trajectory Prediction
Abstract
:1. Introduction
- We use a spatial attention mechanism to measure the spatial relationships of nearby vehicles and obtain the global spatial feature of the target vehicle.
- We introduce a temporal attention mechanism to assign different weights to the outputs of the encoder, which can capture the relative impacts of different historical moments on future trajectory prediction.
- The motion feature is extracted by using velocity and acceleration information. Meanwhile, we aggregate it with local and global spatial features into a comprehensive feature representation of the target vehicle.
- Extensive experiments on NGSIM datasets show that our model can improve the accuracy of vehicle trajectory prediction, achieving state-of-the-art performance on the RMSE metric.
2. Related Works
2.1. Classical Methods for Trajectory Prediction
2.2. LSTM-Based Methods for Trajectory Prediction
2.3. Attention-Based Methods for Trajectory Prediction
3. Methods
3.1. Problem Definition
3.2. Overall Model
3.3. Feature Extraction Module
3.4. Attention-Based Encoder–Decoder Module
3.4.1. Encoder Module
3.4.2. Temporal Attention Layer
3.4.3. Decoder Module
3.5. Loss Function
4. Experiments and Results
4.1. Datasets
4.2. Metrics
4.3. Implementation Details
4.4. Ablation Study
4.5. Baselines
- Constant Velocity (CV) [1]: This model uses the constant-velocity Kalman filter to predict the deterministic trajectory of each vehicle.
- Vanilla-LSTM (V-LSTM): The model is based on the LSTM encoder–decoder model, which only uses past trajectories of the target vehicle to predict future trajectories.
- Maneuver-LSTM (M-LSTM) [28]: This model applies the encoder to encode historical trajectories of the target vehicle and its neighbors, and the decoder generates the multi-modal trajectory predictions according to the output of the encoder and the maneuver-encoding vector.
- Social-LSTM (S-LSTM) [11]: The model uses the fully connected network as the social pooling layer for sharing information and generates the uni-modal future trajectory.
- Convolutional Social LSTM (CS-LSTM) [12]: This model utilizes the convolutional social pooling to capture the spatial interactions, and the encoder–decoder module is used to generate the multi-modal trajectory distributions of vehicles.
- Multi-Agent Tensor Fusion (MATF) [10]: This model extracts a multi-agent tensor, which includes the scene context and the historical trajectories of multiple vehicles. Then, the GAN network is used to predict the future trajectories of agents.
- Multi-Head Attention LSTM (MHA-LSTM) [14]: This model applies the multi-head attention mechanism to capture the high-order spatial-temporal interactions of vehicles.
- MHA-LSTM(+f): The model takes the velocity, acceleration, and class information as additional features based on the MHA-LSTM model.
4.6. Quantitative Analysis
- The spatial attention mechanism can extract better spatial interactions between neighboring vehicles, and the graph-based mechanism is suitable for capturing the social relationship in vehicle-trajectory prediction.
- The temporal attention mechanism can effectively capture the different influences of different historical time steps and assigns appropriate weight to the relevant feature representation learned by the encoder. Therefore, the decoder can utilize more valuable information for generating the future trajectories of vehicles, especially in long-term trajectory prediction.
4.7. Qualitative Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schneider, N.; Gavrila, D.M. Pedestrian path prediction with recursive bayesian filters: A comparative study. In Proceedings of the German Conference on Pattern Recognition, Saarbrücken, Germany, 3–6 September 2013; pp. 174–183. [Google Scholar]
- Elnagar, A. Prediction of moving objects in dynamic environments using Kalman filters. In Proceedings of the 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No. 01EX515), Banff, AB, Canada, 29 July–1 August 2001; pp. 414–419. [Google Scholar]
- Deo, N.; Rangesh, A.; Trivedi, M.M. How would surround vehicles move? A unified framework for maneuver classification and motion prediction. IEEE Trans. Intell. Veh. 2018, 3, 129–140. [Google Scholar] [CrossRef] [Green Version]
- Yoon, S.; Kum, D. The multilayer perceptron approach to lateral motion prediction of surrounding vehicles for autonomous vehicles. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, 19–22 June 2016; pp. 1307–1312. [Google Scholar]
- Tran, Q.; Firl, J. Online maneuver recognition and multimodal trajectory prediction for intersection assistance using non-parametric regression. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014; pp. 918–923. [Google Scholar]
- Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Tang, J.; Shu, X.; Yan, R.; Zhang, L. Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 44, 636–647. [Google Scholar] [CrossRef] [PubMed]
- Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1186–1194. [Google Scholar]
- Zhao, T.; Xu, Y.; Monfort, M.; Choi, W.; Baker, C.; Zhao, Y.; Wang, Y.; Wu, Y.N. Multi-agent tensor fusion for contextual trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 12126–12134. [Google Scholar]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
- Deo, N.; Trivedi, M.M. Convolutional social pooling for vehicle trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1468–1476. [Google Scholar]
- Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Non-local social pooling for vehicle trajectory prediction. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 975–980. [Google Scholar]
- Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Attention based vehicle trajectory prediction. IEEE Trans. Intell. Veh. 2020, 6, 175–185. [Google Scholar] [CrossRef]
- Peng, Y.; Zhang, G.; Shi, J.; Xu, B.; Zheng, L. SRAI-LSTM: A Social Relation Attention-based Interaction-awared LSTM for Human Trajectory Prediction. Neurocomputing 2021, 490, 258–268. [Google Scholar] [CrossRef]
- Lefèvre, S.; Vasquez, D.; Laugier, C. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J. 2014, 1, 1. [Google Scholar] [CrossRef] [Green Version]
- Firl, J.; Stübing, H.; Huss, S.A.; Stiller, C. Predictive maneuver evaluation for enhancement of car-to-x mobility data. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Madrid, Spain, 3–7 June 2012; pp. 558–564. [Google Scholar]
- Schreier, M.; Willert, V.; Adamy, J. Bayesian, maneuver-based, long-term trajectory prediction and criticality assessment for driver assistance systems. In Proceedings of the 17th international ieee conference on intelligent transportation systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 334–341. [Google Scholar]
- Aoude, G.S.; Desaraju, V.R.; Stephens, L.H.; How, J.P. Driver behavior classification at intersections and validation on large naturalistic data set. IEEE Trans. Intell. Transp. Syst. 2012, 13, 724–736. [Google Scholar] [CrossRef]
- Houenou, A.; Bonnifait, P.; Cherfaoui, V.; Yao, W. Vehicle trajectory prediction based on motion model and maneuver recognition. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 4363–4369. [Google Scholar]
- Pineda, F.J. Generalization of back-propagation to recurrent neural networks. Phys. Rev. Lett. 1987, 59, 2229. [Google Scholar] [CrossRef]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 1310–1318. [Google Scholar]
- Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
- Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5668–5675. [Google Scholar]
- Park, S.H.; Kim, B.; Kang, C.M.; Chung, C.C.; Choi, J.W. Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1672–1678. [Google Scholar]
- Zhao, X.; Chen, Y.; Guo, J.; Zhao, D. A spatial-temporal attention model for human trajectory prediction. IEEE CAA J. Autom. Sin. 2020, 7, 965–974. [Google Scholar] [CrossRef]
- Altché, F.; de La Fortelle, A. An LSTM network for highway trajectory prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 353–359. [Google Scholar]
- Deo, N.; Trivedi, M.M. Multi-modal trajectory prediction of surrounding vehicles with maneuver based lstms. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1179–1184. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.; Wierstra, D. Draw: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1462–1471. [Google Scholar]
- Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
- Xiao, Y.; Yin, H.; Zhang, Y.; Qi, H.; Zhang, Y.; Liu, Z. A dual-stage attention-based Conv-LSTM network for spatio-temporal correlation and multivariate time series prediction. Int. J. Intell. Syst. 2021, 36, 2036–2057. [Google Scholar] [CrossRef]
- Chen, K.; Song, X.; Ren, X. Modeling social interaction and intention for pedestrian trajectory prediction. Phys. Stat. Mech. Appl. 2021, 570, 125790. [Google Scholar] [CrossRef]
- Wang, R.; Cui, Y.; Song, X.; Chen, K.; Fang, H. Multi-information-based convolutional neural network with attention mechanism for pedestrian trajectory prediction. Image Vis. Comput. 2021, 107, 104110. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Wu, Y.; Chen, G.; Li, Z.; Zhang, L.; Xiong, L.; Liu, Z.; Knoll, A. HSTA: A Hierarchical Spatio-Temporal Attention Model for Trajectory Prediction. IEEE Trans. Veh. Technol. 2021, 70, 11295–11307. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Xu, Y.; Ren, D.; Li, M.; Chen, Y.; Fan, M.; Xia, H. Tra2Tra: Trajectory-to-Trajectory Prediction With a Global Social Spatial-Temporal Attentive Neural Network. IEEE Robot. Autom. Lett. 2021, 6, 1574–1581. [Google Scholar] [CrossRef]
- Yang, J.; Sun, X.; Wang, R.G.; Xue, L.X. PTPGC: Pedestrian trajectory prediction by graph attention network with ConvLSTM. Robot. Auton. Syst. 2022, 148, 103931. [Google Scholar] [CrossRef]
- Kosaraju, V.; Sadeghian, A.; Martín-Martín, R.; Reid, I.; Rezatofighi, S.H.; Savarese, S. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. arXiv 2019, arXiv:1907.03395. [Google Scholar]
- Lin, Z.; Feng, M.; Santos, C.N.D.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y. A structured self-attentive sentence embedding. arXiv 2017, arXiv:1703.03130. [Google Scholar]
- Chen, J.; Chen, G.; Li, Z.; Wu, Y.; Knoll, A. Attention Based Graph Convolutional Networks for Trajectory Prediction. In Proceedings of the 2021 6th IEEE International Conference on Advanced Robotics and Mechatronics (ICARM), Chongqing, China, 3–5 July 2021; pp. 852–857. [Google Scholar]
- Ding, W.; Chen, J.; Shen, S. Predicting vehicle behaviors over an extended horizon using behavior interaction network. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8634–8640. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Model Variants | RMSE (m) | ||||
---|---|---|---|---|---|
1s | 2s | 3s | 4s | 5s | |
V-LSTM | 0.68 | 1.65 | 2.91 | 4.46 | 6.27 |
SA-LSTM | 0.61 | 1.24 | 2.01 | 2.80 | 3.72 |
STA-LSTM | 0.56 | 1.13 | 1.85 | 2.63 | 3.53 |
STAM-LSTM | 0.43 | 0.96 | 1.60 | 2.37 | 3.24 |
Dataset | Prediction Horizon(s) | RMSE (m) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
CV | V-LSTM | M-LSTM | S-LSTM | CS-LSTM | MATF | MHA-LSTM | MHA-LSTM(+f) | STAM-LSTM | ||
1 | 0.73 | 0.68 | 0.58 | 0.65 | 0.61 | 0.67 | 0.56 | 0.41 | 0.43 | |
2 | 1.78 | 1.65 | 1.26 | 1.31 | 1.27 | 1.34 | 1.22 | 1.01 | 0.96 | |
NGSIM | 3 | 3.13 | 2.91 | 2.12 | 2.16 | 2.09 | 2.08 | 2.01 | 1.74 | 1.60 |
4 | 4.78 | 4.46 | 3.24 | 3.25 | 3.10 | 2.97 | 3.00 | 2.67 | 2.37 | |
5 | 6.68 | 6.27 | 4.66 | 4.55 | 4.37 | 4.13 | 4.25 | 3.83 | 3.24 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, R.; Xu, H.; Gong, G.; Kuang, Y.; Liu, Z. Spatial-Temporal Attentive LSTM for Vehicle-Trajectory Prediction. ISPRS Int. J. Geo-Inf. 2022, 11, 354. https://doi.org/10.3390/ijgi11070354
Jiang R, Xu H, Gong G, Kuang Y, Liu Z. Spatial-Temporal Attentive LSTM for Vehicle-Trajectory Prediction. ISPRS International Journal of Geo-Information. 2022; 11(7):354. https://doi.org/10.3390/ijgi11070354
Chicago/Turabian StyleJiang, Rui, Hongyun Xu, Gelian Gong, Yong Kuang, and Zhikang Liu. 2022. "Spatial-Temporal Attentive LSTM for Vehicle-Trajectory Prediction" ISPRS International Journal of Geo-Information 11, no. 7: 354. https://doi.org/10.3390/ijgi11070354
APA StyleJiang, R., Xu, H., Gong, G., Kuang, Y., & Liu, Z. (2022). Spatial-Temporal Attentive LSTM for Vehicle-Trajectory Prediction. ISPRS International Journal of Geo-Information, 11(7), 354. https://doi.org/10.3390/ijgi11070354