Rapid Vehicle Trajectory Prediction Based on Multi-Attention Mechanism for Fusing Multimodal Information
Abstract
1. Introduction
- We propose a trajectory prediction framework based on the encoder–decoder paradigm, which effectively utilizes the historical trajectory information, interaction data, and spatial information from traffic scenarios to achieve precise and efficient trajectory predictions.
- We introduce a sparse graph attention learning method to capture the interaction relationships among the agents in traffic scenarios. This method efficiently extracts the interaction features within local areas and adaptively eliminates redundant interactions.
- We propose a stochastic non-autoregressive query generation method to obtain decoding queries in a single inference step. This leads to the construction of a fully non-autoregressive Transformer network, enabling a multimodal trajectory prediction by leveraging the rich interaction features.
2. Related Work
2.1. Recurrent Neural Networks
2.2. Graph Neural Networks
2.3. Transformer
3. Methods
3.1. Problem Formulation
3.1.1. Assumptions
3.1.2. Problem Description
3.2. Overview
3.3. Local Encoder
3.3.1. Data Preprocessing Module
3.3.2. Agent–Agent Interaction Module
3.3.3. Temporal Transformer
3.3.4. Agent–Lane Interaction Module
3.4. Global Encoder
3.5. Query Generation Module
3.6. Decoder
3.7. Loss Function Define
4. Results
4.1. Implementation Detail
4.2. Dataset and Metrics
4.3. Quantitative Analysis
4.3.1. Comparative Experiment
4.3.2. Ablation Experiment
4.4. Qualitative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, F.-Y. MetaVehicles in the Metaverse: Moving to a New Phase for Intelligent Vehicles and Smart Mobility. IEEE Trans. Intell. Veh. 2022, 7, 1–5. [Google Scholar] [CrossRef]
- Huang, Y.; Du, J.; Yang, Z.; Zhou, Z.; Zhang, L.; Chen, H. A Survey on Trajectory-Prediction Methods for Autonomous Driving. IEEE Trans. Intell. Veh. 2022, 7, 652–674. [Google Scholar] [CrossRef]
- Cao, D.; Wang, X.; Li, L.; Lv, C.; Na, X.; Xing, Y.; Li, X.; Li, Y.; Chen, Y.; Wang, F.-Y. Future Directions of Intelligent Vehicles: Potentials, Possibilities, and Perspectives. IEEE Trans. Intell. Veh. 2022, 7, 7–10. [Google Scholar] [CrossRef]
- Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2015, arXiv:1409.2329. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
- Krichen, M. Generative Adversarial Networks. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–7. [Google Scholar]
- Zhou, Z.; Ye, L.; Wang, J.; Wu, K.; Lu, K. HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8813–8823. [Google Scholar]
- Wang, Z.; Zhang, J.; Chen, J.; Zhang, H. Spatio-Temporal Context Graph Transformer Design for Map-Free Multi-Agent Trajectory Prediction. IEEE Trans. Intell. Veh. 2024, 9, 1369–1381. [Google Scholar] [CrossRef]
- Correia, G.M.; Niculae, V.; Martins, A.F.T. Adaptively Sparse Transformers. arXiv 2019, arXiv:1909.00015. [Google Scholar]
- Chang, M.-F.; Ramanan, D.; Hays, J.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; et al. Argoverse: 3D Tracking and Forecasting With Rich Maps. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8740–8749. [Google Scholar]
- Chen, X.; Zhang, H.; Zhao, F.; Cai, Y.; Wang, H.; Ye, Q. Vehicle Trajectory Prediction Based on Intention-Aware Non-Autoregressive Transformer With Multi-Attention Learning for Internet of Vehicles. IEEE Trans. Instrum. Meas. 2022, 71, 2513912. [Google Scholar] [CrossRef]
- Xing, H.; Liu, W.; Ning, Z.; Zhao, Q.; Cheng, S.; Hu, J. Deep Learning Based Trajectory Prediction in Autonomous Driving Tasks: A Survey. In Proceedings of the 2024 16th International Conference on Computer and Automation Engineering (ICCAE), Melbourne, Australia, 14–16 March 2024; pp. 556–561. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Dey, R.; Salem, F.M. Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
- Phillips, D.J.; Wheeler, T.A.; Kochenderfer, M.J. Generalizable Intention Prediction of Human Drivers at Intersections. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1665–1670. [Google Scholar]
- Zyner, A.; Worrall, S.; Ward, J.; Nebot, E. Long Short Term Memory for Driver Intent Prediction. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1484–1489. [Google Scholar]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
- Deo, N.; Trivedi, M.M. Convolutional Social Pooling for Vehicle Trajectory Prediction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1549–15498. [Google Scholar]
- Zhang, H.; Wang, Y.; Liu, J.; Li, C.; Ma, T.; Yin, C. A Multi-Modal States Based Vehicle Descriptor and Dilated Convolutional Social Pooling for Vehicle Trajectory Prediction. arXiv 2020, arXiv:2003.03480. [Google Scholar]
- Chai, Y.; Sapp, B.; Bansal, M.; Anguelov, D. MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction. arXiv 2019, arXiv:1910.05449. [Google Scholar]
- Salzmann, T.; Ivanovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar]
- Gilles, T.; Sabatini, S.; Tsishkou, D.; Stanciulescu, B.; Moutarde, F. GOHOME: Graph-Oriented Heatmap Output for Future Motion Estimation. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 9107–9114. [Google Scholar]
- Gao, J.; Sun, C.; Zhao, H.; Shen, Y.; Anguelov, D.; Li, C.; Schmid, C. VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11522–11530. [Google Scholar]
- Zhao, H.; Gao, J.; Lan, T.; Sun, C.; Sapp, B.; Varadarajan, B.; Shen, Y.; Shen, Y.; Chai, Y.; Schmid, C.; et al. TNT: Target-driveN Trajectory Prediction. In Conference on Robot Learning; PMLR: New York, NY, USA, 2020. [Google Scholar]
- Gu, J.; Sun, C.; Zhao, H. DenseTNT: End-to-End Trajectory Prediction from Dense Goal Sets. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15283–15292. [Google Scholar]
- Liu, Y.; Zhang, J.; Fang, L.; Jiang, Q.; Zhou, B. Multimodal Motion Prediction with Stacked Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7573–7582. [Google Scholar]
- Liu, M.; Cheng, H.; Chen, L.; Broszio, H.; Li, J.; Zhao, R.; Sester, M.; Yang, M.Y. LAformer: Trajectory Prediction for Autonomous Driving with Lane-Aware Scene Constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Zhou, Z.; Wang, J.; Li, Y.; Huang, Y. Query-Centric Trajectory Prediction. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 17863–17873. [Google Scholar]
- Zhou, Z.; Wen, Z.; Wang, J.; Li, Y.-H.; Huang, Y.-K. QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction. arXiv 2023, arXiv:2306.10508. [Google Scholar]
- Chen, K.; Chen, G.; Xu, D.; Zhang, L.; Huang, Y.; Knoll, A. NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting. arXiv 2021, arXiv:2102.05624. [Google Scholar]
- Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6271–6280. [Google Scholar]
- Wu, S.; Xiao, X.; Ding, Q.; Zhao, P.; Wei, Y.; Huang, J. Adversarial Sparse Transformer for Time Series Forecasting. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Chen, N.; Watanabe, S.; Villalba, J.; Zelasko, P.; Dehak, N. Non-Autoregressive Transformer for Speech Recognition. IEEE Signal Process. Lett. 2021, 28, 121–125. [Google Scholar] [CrossRef]
- Gu, J.; Bradbury, J.; Xiong, C.; Li, V.O.K.; Socher, R. Non-Autoregressive Neural Machine Translation. arXiv 2017, arXiv:1711.02281. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
- Zeng, W.; Liang, M.; Liao, R.; Urtasun, R. LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 532–539. [Google Scholar]
- Liang, M.; Yang, B.; Hu, R.; Chen, Y.; Liao, R.; Feng, S.; Urtasun, R. Learning Lane Graph Representations for Motion Forecasting. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]








| Parameter | Value | Parameter | Value | 
|---|---|---|---|
| Epoch | 50 | Radius of local regions | 50 m | 
| Hidden dimension | 128 | Number of modes (K) | 6 | 
| Batch size | 128 | Horizon (H) | 3 s | 
| Initial learning rate | 10−3 | A-A layers | 3 | 
| Weight decay | 10−4 | A-L layers | 1 | 
| Dropout rate | 0.1 | Global layers | 3 | 
| Number of heads | 8 | TT layers | 4 | 
| Method | minADE (K = 1) | minFDE (K = 1) | MR (K = 1) | minADE (K = 6) | minFDE (K = 6) | MR (K = 6) | Time (K = 6) | 
|---|---|---|---|---|---|---|---|
| LaneRCNN | 1.685 | 3.692 | 0.569 | 0.904 | 1.453 | 0.123 | - | 
| LaneGCN | 1.702 | 3.762 | 0.588 | 0.870 | 1.362 | 0.162 | - | 
| TNT | 2.174 | 4.959 | 0.710 | 0.910 | 1.446 | 0.166 | 531 | 
| HiVT | 1.598 | 3.533 | 0.547 | 0.774 | 1.169 | 0.127 | 153 | 
| Laformer | 1.553 | 3.453 | 0.547 | 0.772 | 1.163 | 0.125 | 115 | 
| DenseTNT | 1.679 | 3.632 | 0.584 | 0.882 | 1.282 | 0.126 | 482 | 
| Ours | 1.557 | 3.451 | 0.545 | 0.774 | 1.158 | 0.118 | 108 | 
| TT | A-A | A-L | Global | QG | minADE (K = 6) | minFDE (K = 6) | MR (K = 6) | Time (K = 6) | |
|---|---|---|---|---|---|---|---|---|---|
| Model_1 | √ | √ | √ | √ | 1.251 | 2.132 | 0.287 | 84 | |
| Model_2 | √ | √ | √ | √ | 0.868 | 1.348 | 0.155 | 93 | |
| Model_3 | √ | √ | √ | √ | 0.811 | 1.175 | 0.132 | 100 | |
| Model_4 | √ | √ | √ | √ | 0.819 | 1.171 | 0.129 | 96 | |
| Model_5 | √ | √ | √ | √ | 0.751 | 1.141 | 0.120 | 443 | |
| Complete Model | √ | √ | √ | √ | √ | 0.774 | 1.158 | 0.118 | 108 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ge, L.; Wang, S.; Wang, G. Rapid Vehicle Trajectory Prediction Based on Multi-Attention Mechanism for Fusing Multimodal Information. Electronics 2024, 13, 4806. https://doi.org/10.3390/electronics13234806
Ge L, Wang S, Wang G. Rapid Vehicle Trajectory Prediction Based on Multi-Attention Mechanism for Fusing Multimodal Information. Electronics. 2024; 13(23):4806. https://doi.org/10.3390/electronics13234806
Chicago/Turabian StyleGe, Likun, Shuting Wang, and Guangqi Wang. 2024. "Rapid Vehicle Trajectory Prediction Based on Multi-Attention Mechanism for Fusing Multimodal Information" Electronics 13, no. 23: 4806. https://doi.org/10.3390/electronics13234806
APA StyleGe, L., Wang, S., & Wang, G. (2024). Rapid Vehicle Trajectory Prediction Based on Multi-Attention Mechanism for Fusing Multimodal Information. Electronics, 13(23), 4806. https://doi.org/10.3390/electronics13234806
 
        

 
       