# MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction

^{*}

## Abstract

**:**

## 1. Introduction

- In order to model the social dependence of past trajectories, we propose a Social Multi-Head Attention (SMHA) mechanism, and to model the temporal dependence, we use a successive Temporal Multi-Head Attention (TMHA) mechanism to focus attention on both social and temporal interaction and encode the input data.
- A similar MHA-based LSTM decoding step is proposed to extract the predicted socio-temporal interaction in successive decoding steps, which improves the successive prediction accuracy and minimizes the accumulative errors of the transformer decoder.
- The evaluation of the method has been performed on BLVD, a large-scale vehicle trajectory dataset that has less noise and is extracted from egocentric onboard-sensor data, to prevent overfitting. The experimental results demonstrate the superior performance of our model over state-of-the-art methods that have been implemented on both NGSIM and BLVD datasets.

## 2. Related Work

## 3. Methodology

#### 3.1. Problem Formulation

#### 3.2. Network Architecture

#### 3.3. Input Representation

#### 3.4. Social Multi-Head Attention

#### 3.5. Positional Encoding

#### 3.6. Temporal Multi-Head Attention

#### 3.7. LSTM Encoder

#### 3.8. STMHA-LSTM Decoder

## 4. Experiments

#### 4.1. Dataset

#### 4.2. Implementation Details

#### 4.3. Hyperparameter Settings

#### 4.4. Evaluation Metrics

#### 4.5. Ablative Analysis

**MALSwoTA:**This variant of the model excludes the TMHA layer which extracts the temporal correlation. It thus also excludes the PE module and the output from the SMHA is directly fed into the LSTM encoder.**MALSwoSA:**This variant of the model excludes the SMHA layer which extracts the social correlation. The DE module is also excluded from this variant as that contributed to the social correlation information included in the encoded tensor. The output from the IE layer is directly fed into the PE and then the TMHA layer with the dimension ${D}_{model}$ after the input embedding (IE).**MALSwoLE:**This variant of the model excludes the LSTM encoder. The hidden tensors for the LSTM decoder are prepared via some specific operations including an additional MLP that maps the ${D}_{model}$ out of the TMHA layer to a size of ${D}_{hid}$.

**MALSwoLED:**This variant of the model is a version of the original transformer architecture. It excludes both the LSTM encoder and decoder. The encoded input from the STMHA is directly fed into another STMHA-based decoder similar to the transformer decoder and the right-shifted outputs are then fed into the decoder to make successive predictions.**MALSwoLDA:**This variant excludes the STMHA block between successive decoding steps. The hidden information from the LSTM is passed onto the next decoding step without extracting further socio-temporal context from it in making successive timestep predictions.

#### 4.6. Comparative Analysis

**CV**[16]: This method assumes a constant velocity and applies a Kalman Filter to predict future trajectories.**V-LSTM**[3]: This method uses a simple LSTM-based encoder-decoder model to make predictions.**S-LSTM**[3]: This method uses a social pooling technique to sum the neighboring vehicle features via an LSTM to predict trajectories.**CS-LSTM**[3]: This method models the traffic in grids and utilizes the convolution operation to extract social interaction and predict future trajectories.**DSCAN**[46]: This method uses a constraint network and models attention between vehicles to extract the weights to make future predictions.**SGAN**[36]: This method uses an adversarial network architecture that utilizes an encoder-decoder structure as well as a discriminator to make trajectory predictions.**HMNet**[47]: This model utilizes a hierarchical context-free LSTM encoder-decoder to forecast the trajectories.

#### 4.7. Prediction Visualization

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

CNN | Convolutional Neural Network |

RNN | Recurrent Neuran Network |

GNN | Graph Neural Network |

NLP | Natural Language Processing |

LSTM | Long Short-Term Memory |

GRU | Gated Recurrent Unit |

## References

- Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Attention based Vehicle Trajectory Prediction. IEEE Trans. Intell. Vehicles
**2021**, 6, 175–185. [Google Scholar] [CrossRef] - Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 18–20 June 2016; pp. 961–971. [Google Scholar]
- Deo, N.; Trivedi, M.M. Convolutional Social Pooling for Vehicle Trajectory Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1468–1476. [Google Scholar]
- Wang, R.; Li, M.; Zhang, P.; Wen, F. Graph Partition Convolution Neural Network for Pedestrian Trajectory Prediction. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA, 1–3 November 2021; pp. 8994–9003. [Google Scholar]
- Wang, Y.; Zhao, S.; Zhang, R.; Cheng, X.; Yang, L. Multi-Vehicle Collaborative Learning for Trajectory Prediction with Spatio-Temporal Tensor Fusion. IEEE Trans. Intell. Transp. Syst.
**2020**, 23, 236–248. [Google Scholar] [CrossRef] - Li, X.; Ying, X.; Chuah, M.C. GRIP++: Enhanced Graph-based Interaction-Aware Trajectory Prediction for Autonomous Driving. arXiv
**2019**, arXiv:1907.07792. [Google Scholar] - Zhou, H.; Ren, D.; Xia, H.; Fan, M.; Yang, X.; Huang, H. AST-GNN: An Attention-based Spatio-Temporal Graph Neural Network for Interaction-Aware Pedestrian Trajectory Prediction. Neurocomputing
**2021**, 445, 298–308. [Google Scholar] [CrossRef] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. IEEE Adv. Neural Inf. Process. Syst.
**2017**, 5998–6008. [Google Scholar] - Chen, K.; Chen, G.; Xu, D.; Zhang, L.; Huang, Y.; Knoll, A. NAST: Non-autoregressive Spatial-Temporal Transformer for Time Series Forecasting. arXiv
**2021**, arXiv:2102.05624. [Google Scholar] - Next Generation Simulation (NGSIM). Available online: http://ops.fhwa.dot.gov/trafficanalysistools/ngsim.html (accessed on 20 September 2022).
- Coifman, B.; Li, L. A Critical Evaluation of the Next Generation Simulation (NGSIM) Vehicle Trajectory Dataset. Transp. Res. Part B Methodol.
**2017**, 105, 362–377. [Google Scholar] [CrossRef] - Thiemann, C.; Treiber, M.; Kesting, A. Estimating Acceleration and Lane-Changing Dynamics from Next Generation Simulation Trajectory Data. Trans. Res. Rec.
**2008**, 2088, 90–101. [Google Scholar] [CrossRef] [Green Version] - Hamdar, S.; Mahmassani, H. Driver Car-Following Behavior: From Discrete Event Process to Continuous Set of Episodes. In Proceedings of the 87th Annual Meeting of the Transportation Research Board, Washington, DC, USA, 13–17 January 2008. [Google Scholar]
- Duret, A.; Buisson, C.; Chiabaut, N. Estimating Individual Speed-Spacing Relationship and Assessing Ability of Newell’s Car-following Model to Reproduce Trajectories. Trans. Res. Rec.
**2008**, 2088, 188–197. [Google Scholar] [CrossRef] - Montanino, M.; Punzo, V. Trajectory Data Reconstruction and Simulation-based Validation against Macroscopic Traffic Patterns. Trans. Res. Part B
**2015**, 80, 82–106. [Google Scholar] [CrossRef] - Fei, R.; Li, S.; Hei, X.; Xu, Q.; Zhao, J.; Guo, Y. A Motion Simulation Model for Road Network based Crowdsourced Map Datum. J. Intell. Fuzzy Syst.
**2020**, 38, 391–407. [Google Scholar] [CrossRef] - Houenou, A.; Bonnifait, P.; Cherfaoui, V.; Yao, W. Vehicle Trajectory Prediction based on Motion Model and Maneuver Recognition. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 4363–4369. [Google Scholar]
- Elnagar, P. Prediction of Moving Objects in Dynamic Environments using Kalman Filters. In Proceedings of the 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515), Banff, AB, Canada, 29 July–1 August 2001; pp. 414–419. [Google Scholar]
- Qiao, S.J.; Jin, K.; Han, N.; Tang, C.J.; Gesangduoji, G. Trajectory Prediction Algorithm based on Gaussian Mixture Model. J. Softw.
**2015**, 26, 1048–1063. [Google Scholar] - Qiao, S.; Shen, D.; Wang, X.; Han, N.; Zhu, W. A Self-Adaptive Parameter Selection Trajectory Prediction Approach via Hidden Markov Models. IEEE Trans. Intell. Transp. Syst.
**2014**, 16, 284–296. [Google Scholar] [CrossRef] - Treiber, M.; Hennecke, A.; Helbing, D. Congested Traffic States in Empirical Observations and Microscopic Simulations. Phys. Rev. E
**2000**, 62, 1805. [Google Scholar] [CrossRef] [PubMed] - Deo, N.; Rangesh, A.; Trivedi, M.M. How Would Surround Vehicles Move? a unified framework for maneuver classification and motion prediction. IEEE Trans. Intell. Veh.
**2018**, 3, 129–140. [Google Scholar] [CrossRef] [Green Version] - Huang, C.; Huang, H.; Zhang, J.; Hang, P.; Hu, Z.; Lv, C. Human-Machine Cooperative Trajectory Planning and Tracking for Safe Automated Driving. IEEE Trans. Intell. Transp. Syst.
**2022**, 23, 12050–12063. [Google Scholar] [CrossRef] - Huang, C.; Hang, P.; Hu, A.; Lv, C. Collision-Probability-Aware Human-Machine Cooperative Planning for Safe Automated Driving IEEE Trans. Veh. Technol.
**2021**, 70, 9752–9763. [Google Scholar] [CrossRef] - Zhang, Y.; Hang, P.; Huang, C.; Lv, C. Human-Like Interactive Behavior Generation for Autonomous Vehicles: A Bayesian Game-Theoretic Approach with Turing Test. Adv. Intell. Syst.
**2022**, 4, 2100211. [Google Scholar] [CrossRef] - Gomes, I.; Wolf, D. A Review on Intention-aware and Interaction-aware Trajectory Prediction for Autonomous Vehicles. TechRxiv
**2022**, 14. Available online: https://www.techrxiv.org/articles/preprint/A_Review_on_Intention-aware_and_Interaction-aware_Trajectory_Prediction_for_Autonomous_Vehicles/19337447/1 (accessed on 15 November 2022). [CrossRef] - Tomar, R.S.; Verma, S.; Tomar, G.S. SVM Based Trajectory Predictions of Lane Changing Vehicles. In Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks, Gwalior, India, 7–9 October 2011; pp. 716–721. [Google Scholar]
- Chen, X.; Yang, J.; Ye, Q.; Liang, J. Recursive Projection Twin Support Vector Machine via Within-class Variance Minimization. Pattern Recognit.
**2011**, 44, 2643–2655. [Google Scholar] - Goli, S.A.; Far, B.H.; Fapojuwo, A.O. Vehicle Trajectory Prediction with Gaussian Process Regression in Connected Vehicle Environment. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 550–555. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv
**2016**, arXiv:1412.3555. [Google Scholar] - Althoff, M.; Mergel, A. Comparison of Markov Chain Abstraction and Monte Carlo Simulation for the Safety Assessment of Autonomous Cars. IEEE Trans. Intell. Transp. Syst.
**2011**, 12, 1237–1247. [Google Scholar] [CrossRef] - Hillenbrand, J.; Spieker, A.M.; Kroschel, K. A Multilevel Collision Mitigation Approach—Its Situation Assessment, Decision Making, and Performance Tradeoffs. IEEE Trans. Intell. Transp. Syst.
**2006**, 7, 528–540. [Google Scholar] [CrossRef] - Xing, Y.; Lv, C.; Mo, X.; Hu, Z.; Huang, C.; Hang, P. Toward Safe and Smart Mobility: Energy-Aware Deep Learning for Driving Behavior Analysis and Prediction of Connected Vehicles IEEE Trans. Intell. Transp. Syst.
**2021**, 22, 4267–4280. [Google Scholar] [CrossRef] - Xing, Y.; Huang, C.; Lv, C.; Liu, Y.; Wang, H.; Cao, D. A Personalized Deep Learning Approach for Trajectory Prediction of Connected Vehicles; SAE International: Warrendale, PA, USA, 2020. [Google Scholar]
- Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2255–2264. [Google Scholar]
- Mo, X.; Xing, Y.; Lv, C. Interaction-Aware Trajectory Prediction of Connected Vehicles using CNN-LSTM Networks. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5057–5062. [Google Scholar]
- Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction. arXiv
**2019**, arXiv:1903.02793. [Google Scholar] - Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; Volume 12357, pp. 507–523. [Google Scholar]
- Pang, Y.; Zhao, X.; Hu, J.; Yan, H.; Liu, Y. Bayesian Spatio-Temporal Graph Transformer Network (B-Star) for Multi-Aircraft Trajectory Prediction. Available online: https://ssrn.com/abstract=3981312 (accessed on 30 September 2022).
- Xue, J.; Fang, J.; Li, T.; Zhang, B.; Zhang, P.; Ye, Z.; Dou, J. Blvd: Building a Large-Scale 5D Semantics Benchmark for Autonomous Driving. arXiv
**2019**, arXiv:1903.06405. [Google Scholar] - Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How Does Batch Normalization Help Optimization? arXiv
**2018**, arXiv:1805.11604v5. [Google Scholar] - Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. Proc. Track
**2010**, 9, 249–256. [Google Scholar] - Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training Recurrent Neural Networks. arXiv
**2012**, arXiv:1211.5063. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2012**, arXiv:1412.6980. [Google Scholar] - Yu, J.; Zhou, M.; Wang, X.; Pu, G.; Cheng, C.; Chen, B. A Dynamic and Static Context-Aware Attention Network for Trajectory Prediction. ISPRS Int. J. Geo-Inf.
**2021**, 10, 336. [Google Scholar] [CrossRef] - Xue, Q.; Li, S.; Li, X.; Zhao, J.; Zhang, W. Hierarchical motion encoder–decoder network for trajectory forecasting. arXiv
**2021**, arXiv:2111.13324. [Google Scholar] - Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD dataset: A drone dataset of naturalistic vehicle trajectories on German highways for validation of highly automated driving systems. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2118–2125. [Google Scholar]

Model | RMSE-1s | RMSE-2s | RMSE-3s | RMSE-4s | RMSE-5s | Average Improvement |
---|---|---|---|---|---|---|

MALSwoTA | 0.89 | 1.78 | 3.05 | 4.66 | 6.48 | 47.7% |

MALSwoSA | 0.69 | 1.53 | 2.76 | 4.26 | 5.87 | 39.6% |

MALSwoLE | 0.58 | 1.26 | 2.11 | 3.17 | 4.48 | 23.5% |

MALS-Net | 0.48 | 1.01 | 1.60 | 2.31 | 3.36 |

Model | RMSE-1s | RMSE-2s | RMSE-3s | RMSE-4s | RMSE-5s | Average Improvement |
---|---|---|---|---|---|---|

MALSwoLED | 0.64 | 1.66 | 2.58 | 4.71 | 5.53 | 39.1% |

MALSwoLDA | 0.57 | 1.21 | 2.58 | 3.15 | 4.70 | 25.9% |

MALS-Net | 0.48 | 1.01 | 1.60 | 2.31 | 3.36 |

${\mathit{d}}_{\mathit{near}}$ (m) Mean | RMSE-1s | RMSE-2s | RMSE-3s | RMSE-4s | RMSE-5s |
---|---|---|---|---|---|

10 | 0.51 | 1.42 | 2.38 | 3.67 | 4.70 |

20 | 0.46 | 1.25 | 1.98 | 2.86 | 3.91 |

30 | 0.43 | 1.21 | 1.96 | 2.86 | 3.88 |

40 | 0.44 | 1.20 | 1.95 | 2.86 | 3.86 |

Model Average Improvement | RMSE-1s | RMSE-2s | RMSE-3s | RMSE-4s | RMSE-5s |
---|---|---|---|---|---|

CV | 0.73 | 1.78 | 3.13 | 4.78 | 6.68 |

V-LSTM | 0.68 | 1.65 | 2.91 | 4.46 | 6.27 |

S-LSTM | 0.65 | 1.31 | 2.16 | 3.25 | 4.55 |

CS-LSTM | 0.61 | 1.27 | 2.09 | 3.10 | 4.37 |

DSCAN | 0.58 | 1.26 | 2.03 | 2.98 | 4.13 |

SGAN | 0.57 | 1.32 | 2.22 | 3.26 | 4.40 |

HMNet | 0.50 | 1.13 | 1.89 | 2.85 | 4.04 |

MALS-Net | 0.48 | 1.01 | 1.60 | 2.31 | 3.36 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hasan, F.; Huang, H.
MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction. *Sensors* **2023**, *23*, 530.
https://doi.org/10.3390/s23010530

**AMA Style**

Hasan F, Huang H.
MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction. *Sensors*. 2023; 23(1):530.
https://doi.org/10.3390/s23010530

**Chicago/Turabian Style**

Hasan, Fuad, and Hailong Huang.
2023. "MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction" *Sensors* 23, no. 1: 530.
https://doi.org/10.3390/s23010530