Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light
Abstract
:1. Introduction
- Among the surrounding environment elements, we incorporated traffic light information. We considered various signal information environment that occur in urban intersection situations.
- Previously, sequence prediction with social interaction mainly used the trajectory information of the agent as input. Our model used not only trajectory information but also agent state information such as speed and acceleration as input.
- To reflect the scene context, we used the ResNet18 model, which shows high performance in image recognition, to extract image features and reflect them in the model.
- Predict multiple paths based on generative models.
2. Related Works
2.1. LSTM for Sequence Prediction
2.2. Agent-Agent Models
2.3. Multi Modal and Generative Modeling
2.4. Scene Context-Aware Prediction
3. Materials and Methods
3.1. Problem Definition
3.2. Model Overview
3.3. Generating Traffic Signal Information
3.4. Model Details
3.4.1. Scene Context Encoder
3.4.2. LSTM-Based Encoder
3.4.3. Social Interaction
3.4.4. Feature Fusion
3.4.5. GAN-Based Decoder
4. Experiments
4.1. Dataset
4.2. Preprocessing
4.3. Experimental Environments
4.4. Metrics
4.4.1. ADE (Average Displacement Error)
4.4.2. FDE (Final Displacement Error)
4.5. Implementation Detail
4.6. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Two-Digit Code | State (RGB Values of Traffic Light) |
---|---|
00 | G G G R R G R R G R |
01 | G R R R R G R R R R |
02 | R R G R R G R R G R |
03 | R R R G G R G G R G |
04 | R R R R G R R G R G |
05 | R R R R R R G G R R |
06 | R R R R R R R R R R |
07 | R R R Y Y R R R R Y |
08 | R R R Y Y R Y Y R Y |
09 | Y Y Y R R Y R R Y R |
10 | G G R R R R R R R R |
11 | R R R G R R G R R R |
12 | R R R R R R Y Y R R |
13 | Y R R R R Y R R R R |
14 | G G G G G R R R R R |
15 | Y Y Y Y Y R R R R R |
16 | Y Y R R R R R R R R |
17 | R R R R R G G R R R |
18 | R R R R R G G G G G |
19 | R R R R R Y Y Y Y Y |
20 | R R R R R Y Y R R R |
21 | R G R G G R R R R R |
22 | R R R R R R G R G R |
23 | G R G R R R R R R R |
24 | R R R R R G R G R G |
25 | G R G R G R R R R R |
Appendix B
Prediction Time (s) | Metrics | SGAN | Ours | Ours without Traffic Light |
---|---|---|---|---|
1.6 | ADE | 5.36 | 2.87 | 2.76 |
AVG | 7.15 | 3.65 | 3.64 | |
FDE | 8.94 | 4.42 | 4.52 |
Prediction Time (s) | Metrics | SGAN | Ours | Ours without Traffic Light |
---|---|---|---|---|
4.8 | ADE | 6.84 | 3.96 | 12.65 |
AVG | 10.18 | 5.91 | 19.04 | |
FDE | 13.52 | 7.86 | 25.43 |
Appendix C
(a) Excluding Traffic Light Signal | ||
---|---|---|
Prediction Time (s) | Metrics | Ours without Traffic Light |
1.6 | ADE | 3.20 |
AVG | 4.34 | |
FDE | 5.47 | |
3.2 | ADE | 5.62 |
AVG | 8.22 | |
FDE | 10.82 | |
4.8 | ADE | 7.77 |
AVG | 11.65 | |
FDE | 15.52 |
(b) Including Traffic Light Signal | ||||||
---|---|---|---|---|---|---|
Ablation Study | Metrics | 1.6 s | 3.2 s | 4.8 s | ||
S | Acc | Ang | ||||
√ | ADE | 2.86 | 4.58 | 7.37 | ||
AVG | 3.83 | 6.68 | 10.85 | |||
FDE | 4.79 | 8.77 | 14.32 | |||
√ | √ | ADE | 2.60 | 3.94 | 4.90 | |
AVG | 4.72 | 5.63 | 7.31 | |||
FDE | 4.12 | 7.32 | 9.71 | |||
√ | √ | ADE | 3.65 | 6.17 | 8.30 | |
AVG | 4.86 | 8.82 | 12.11 | |||
FDE | 6.07 | 11.47 | 15.93 | |||
√ | √ | √ | ADE | 3.10 | 5.46 | 7.50 |
AVG | 4.22 | 8.03 | 11.31 | |||
FDE | 5.34 | 10.59 | 15.11 |
References
- Lefèvre, S.; Vasquez, D.; Laugier, C. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J. 2014, 1, 1. [Google Scholar] [CrossRef]
- Ammoun, S.; Nashashibi, F. Real time trajectory prediction for collision risk estimation between vehicles. In Proceedings of the 2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 27–29 August 2009. [Google Scholar] [CrossRef]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Li, F.F.; Savarese, S. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 03762. [Google Scholar] [CrossRef]
- Ettinger, S.; Cheng, S.; Caine, B.; Liu, C.; Zhao, H.; Pradhan, S.; Chai, Y.; Sapp, B.; Qi, C.; Zhou, Y.; et al. Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Nachiket, D.; Eric, M.W.; Oscar, B. Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals. In Proceedings of the 5th Conference on Robot Learning (CoRL 2021), London, UK, 8–11 November 2021. [Google Scholar] [CrossRef]
- Luo, C.; Sun, L.; Dabiri, D.; Yuille, A. Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021. [Google Scholar] [CrossRef]
- Deo, N.; Trivedi, M.M. Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018. [Google Scholar] [CrossRef]
- Zhang, Z. ResNet-Based Model for Autonomous Vehicles Trajectory Prediction. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021. [Google Scholar] [CrossRef]
- Sheng, Z.; Xu, Y.; Xue, S.; Li, D. Graph-Based Spatial-Temporal Convolutional Network for Vehicle Trajectory Prediction in Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17654–17665. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, W.; Guo, W.; Lv, P.; Xu, M.; Chen, W.; Manocha, D. D2-TPred: Discontinuous Dependency for Trajectory Prediction Under Traffic Lights. In Proceedings of the ECCV 2022: 17th European Conference (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar] [CrossRef]
- Oh, G.; Peng, H. Impact of traffic lights on trajectory forecasting of human-driven vehicles near signalized intersections. arXiv 2020, arXiv:1906.00486v4. [Google Scholar]
- Yuning, C.; Benjamin, S.; Mayank, B.; Dragomir, A. Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. In Proceedings of the 2019 Conference on Robot Learning (CoRL 2019), Osaka, Japan, 30 October–1 November 2019. [Google Scholar] [CrossRef]
- Multi Agent Traffic Dataset. Available online: https://wiselab.uwaterloo.ca/waterloo-multi-agent-traffic-dataset/intersection-dataset (accessed on 1 July 2023).
- Kong, X.; Xing, W.; Wei, X.; Bao, P.; Zhang, J.; Lu, W. STGAT: Spatial-temporal graph attention networks for traffic flow forecasting. IEEE Access 2020, 8, 134363–134372. [Google Scholar] [CrossRef]
- Kim, B.; Park, S.; Lee, S.; Khoshimjonov, E.; Kum, D.; Kim, J.; Kim, J.; Choi, J. Lapred: Lane-aware prediction of multi-modal future trajectories of dynamic agents. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Gupta, A.; Johnson, J.; Li, F.F.; Savarese, S.; Alahi, A. Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Deo, N.; Trivedi, M.M. Convolutional Social Pooling for Vehicle Trajectory Prediction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
- Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.S.; Chandraker, M. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Zhao, T.; Xu, Y.; Monfort, M.; Choi, W.; Baker, C.; Zhao, Y.; Wang, Y.; Wu, Y.N. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Cui, H.; Radosavljevic, V.; Chou, F.C.; Lin, T.H.; Nguyen, T.; Huang, T.K.; Schneider, J.; Djuric, N. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar] [CrossRef]
- Phan-Minh, T.; Grigore, E.C.; Boulton, F.A.; Beijbom, O.; Wolff, E.M. CoverNet: Multimodal Behavior Prediction Using Trajectory Sets. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Teeti, H.; Khan, S.; Shahbaz, A.; Bradley, A.; Cuzzolin, F. Vision-based Intention and Trajectory Prediction in Autonomous Vehicles: A Survey. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022; pp. 5630–5637. [Google Scholar] [CrossRef]
- Hou, W.; Wu, Z.; Jung, H. Video Road Vehicle Detection and Tracking based on OpenCV. J. Inf. Commun. Converg. Eng. 2022, 20, 226–233. [Google Scholar] [CrossRef]
- Kosaraju, V.; Sadeghian, A.; Martin-Martin, R.; Reid, I.; Rezatiofighi, H.; Savarese, S. Social-BIGAT: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Salzmann, T.; Ivnovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgrow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020. Part XVIII 16. pp. 683–700. [Google Scholar] [CrossRef]
- Agrimgupta92, Sgan, GitHub Repository. Available online: https://github.com/agrimgupta92/sgan (accessed on 8 September 2023).
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
- Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv 2018, arXiv:1707.01926v3. [Google Scholar]
- Wu, Q.; Zhao, Y.; Fan, Q.; Fan, P.; Wang, J.; Zhang, C. Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning. IEEE J. Sel. Top. Signal Process. 2023, 17, 66–81. [Google Scholar] [CrossRef]
Operating System | GPU Card | Library |
---|---|---|
Ubuntu 20.04 | NVIDIA RTX A5000 24GB × 2 | PyTorch v1.13.0 PyTorch CUDA 11.6 |
Prediction Time (s)/Time Steps | Metrics | SGAN | Ours (State Only with Speed, Acc) | Ours (State with Speed, Acc, Angle) |
---|---|---|---|---|
1.6 s/4 | ADE | 3.26 | 2.60 | 3.10 |
AVG | 4.34 | 3.36 | 4.22 | |
FDE | 5.41 | 4.12 | 5.34 | |
3.2 s/8 | ADE | 5.53 | 3.94 | 5.46 |
AVG | 8.03 | 5.63 | 8.03 | |
FDE | 10.52 | 7.32 | 10.59 | |
4.8 s/12 | ADE | 7.55 | 4.90 | 7.50 |
AVG | 11.30 | 7.31 | 11.31 | |
FDE | 15.04 | 9.71 | 15.11 |
Prediction Time (s)/Time Steps | Metrics | Ours (with Scene Context) | Ours (without Scene Context) |
---|---|---|---|
3.2 s/8 | ADE | 7.35 | 5.46 |
AVG | 10.59 | 8.03 | |
FDE | 13.83 | 10.59 | |
Time (s) | 211 | 184 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, S.; Park, H.; You, Y.; Yong, S.; Moon, I.-Y. Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light. Appl. Sci. 2023, 13, 12339. https://doi.org/10.3390/app132212339
Lee S, Park H, You Y, Yong S, Moon I-Y. Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light. Applied Sciences. 2023; 13(22):12339. https://doi.org/10.3390/app132212339
Chicago/Turabian StyleLee, Seoyoung, Hyogyeong Park, Yeonhwi You, Sungjung Yong, and Il-Young Moon. 2023. "Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light" Applied Sciences 13, no. 22: 12339. https://doi.org/10.3390/app132212339
APA StyleLee, S., Park, H., You, Y., Yong, S., & Moon, I.-Y. (2023). Deep Learning-Based Multimodal Trajectory Prediction with Traffic Light. Applied Sciences, 13(22), 12339. https://doi.org/10.3390/app132212339