InfoSTGCAN: An InformationMaximizing SpatialTemporal Graph Convolutional Attention Network for Heterogeneous Human Trajectory Prediction
Abstract
:1. Introduction
1.1. Literature Review
1.1.1. Pedestrian Trajectory Prediction
1.1.2. Graph Neural Networks
1.2. Contributions
 We formulate the task of pedestrian trajectory prediction as a spatialtemporal graph and propose a novel trajectory prediction model, InfoSTGCAN. This model takes both pedestrian interactions and heterogeneous behavior choice modeling into consideration. Through a comprehensive list of experiments, we demonstrate the superiority of InfoSTGCAN in comparison to existing baseline methods.
 Our proposed method integrates spatialtemporal graph convolution and spatialtemporal graph attention. This fusion enables our method to more effectively model pedestrian interactions by evaluating pedestrian importance using a combination of prior knowledge and datadriven features.
 Based on the technique of variational mutual information maximization, our model generates an individuallevel latent code for each pedestrian. These distinct latent codes facilitate the generation of trajectories with heterogeneous behavior choices.
2. Problem Statement
3. Methodology
3.1. SpatialTemporal Graph Convolutional Attention Network
3.1.1. SpatialTemporal Graph Representation of Pedestrian Trajectories
3.1.2. SpatialTemporal Graph Convolution
3.1.3. SpatialTemporal Graph Attention
 In STGC, information from neighboring nodes is communicated by applying convolution filters or kernels on the graph, which typically involves a weighted sum of features across neighboring nodes. Usually, those weights can be identical (e.g., GraphSAGE [63]), predetermined, or learnable ([60,70]). Therefore, the weights are considered to be “explicitly” assigned to the neighborhoods of the focused node during the aggregation process [59].
 However, in STGAT, the weights between two connected nodes are considered to be “implicitly” computed. Specifically, those weights are learned based on the similarity of their feature representations, which takes into account the relative importance for different node pairs [59,73]. Typically, more important nodes tend to have higher similarity scores, resulting in them being assigned larger weights.
3.2. Variational Mutual Information Maximization
 In ref. [85], there is only one latent code for each training example. However, in this paper, there are multiple latent codes for each training example. Different pedestrians may have distinct preferences and walking styles. It is generally infeasible to assume all pedestrians follow the same preference or walking style. Therefore, for each pedestrian n, he or she has its own latent code ${c}^{n}$, and different pedestrians generally have different latent codes, allowing the proposed framework to effectively model the latent patterns in pedestrian trajectories.
 In this paper, the proposed informationtheoretic loss is based on the conditional mutual information. However, in ref. [85], the loss is based on the mutual information.
 Different from the previous research taken in [85], where the prior latent code distribution is assumed to be fixed, we opt to optimize the prior distribution ${P}_{\theta}({c}^{n}\mid t{r}_{obs}^{n})$ as well.
3.3. MultiObjective Loss Function
 ${L}_{\mathrm{pred}}$: The prediction loss relies on negative loglikelihood, which is defined as:$$\begin{array}{cc}\hfill {L}_{\mathrm{pred}}& =\sum _{n}log\left(P\left(t{r}_{pred}^{n}\mid G(t{r}_{obs}^{n},{c}^{n})\right)\right).\hfill \end{array}$$
 ${L}_{\mathrm{GAN}}$: The generative adversarial loss relies on the generator G and the discriminator D, in which two models are jointly trained. The generator G captures the distribution for the future trajectory, and the discriminator distinguishes whether a sample comes from the training data or the generator G.$$\begin{array}{cc}\hfill {L}_{\mathrm{GAN}}& =\mathbb{E}\left[log\left(D(t{r}_{obs}^{n},t{r}_{pred}^{n})\right)\right]+\mathbb{E}\left[log\left(1D(t{r}_{obs}^{n},G(t{r}_{obs}^{n},{c}^{n}))\right)\right]\hfill \end{array}$$
 ${L}_{\mathrm{info}}$: The informationtheoretic loss relies on the conditional prior distribution ${P}_{\theta}({c}^{n}\mid t{r}_{obs}^{n})$, the model $G(t{r}_{obs}^{n},{c}^{n})$, and the approximate posterior ${Q}_{\varphi}({c}^{n}\mid t{r}_{obs}^{n},t{r}_{pred}^{n})$, which has been discussed in Section 3.2.$$\begin{array}{cc}\hfill {L}_{\mathrm{info}}& =\underset{{L}_{I}}{\underbrace{\left({\mathbb{E}}_{{c}^{n}\sim {P}_{\theta}({c}^{n}\mid t{r}_{obs}^{n}),t{r}_{pred}^{n}\sim G(t{r}_{obs}^{n},{c}^{n})}\left[log{Q}_{\varphi}({c}^{n}\mid t{r}_{obs}^{n},t{r}_{pred}^{n})\right]+H({c}^{n}\mid t{r}_{obs}^{n})\right)}}\hfill \end{array}$$
Algorithm 1: Training Procedure of InfoSTGCAN 

4. Experiments and Results
4.1. Datasets and Evaluation Metrics
 Average Displacement Error (ADE): The average ${L}_{2}$ distance between the predicted trajectory and the ground truth trajectory across all time steps, which is defined as follows:$$\begin{array}{c}\hfill \mathrm{ADE}=\frac{1}{N{T}_{pred}}\sum _{n=1}^{N}\sum _{t=1}^{{T}_{pred}}\sqrt{{({x}_{t}^{n}{\widehat{x}}_{t}^{n})}^{2}+{({y}_{t}^{n}{\widehat{y}}_{t}^{n})}^{2}},\end{array}$$
 Final Displacement Error (FDE): The ${L}_{2}$ distance between the predicted final destination and the true final destination at the end of the prediction period ${T}_{pred}$, which is defined as follows:$$\begin{array}{c}\hfill \mathrm{FDE}=\frac{1}{N}\sum _{n=1}^{N}\sqrt{{({x}_{{T}_{pred}}^{n}{\widehat{x}}_{{T}_{pred}}^{n})}^{2}+{({y}_{{T}_{pred}}^{n}{\widehat{y}}_{{T}_{pred}}^{n})}^{2}}.\end{array}$$
4.2. Implementation Details
4.3. Results Analysis
4.3.1. Comparison with Baseline Models
 Linear: A linear regression model characterized by minimizing the least square error.
 Social LSTM (SLSTM) [27]: An LSTM approach that incorporates the “social pooling” mechanism for hidden states.
 SGANPooling [34]: A GANbased approach that utilizes global pooling for pedestrian interactions.
 SRLSTM2 [29]: An LSTMbased method that leverages a state refinement technique.
 GAT [55]: A graph attention network leveraging the sequencetosequence architecture.
 Sophie [35]: A GANbased method that takes both scene and social factors into account through a dual attention mechanism.
 SCAN [58]: An LSTMbased encoder–decoder framework that incorporates a novel spatial attention mechanism to predict trajectories for all pedestrians.
 SocialSTGCNN [57]: A spatialtemporal graphbased approach that employs a spatialtemporal graph convolutional network to handle complex social interactions.
4.3.2. Results Visualization
4.3.3. Interpretable Latent Representation
4.4. Ablation Study
 Demonstrate the crucial role of the GAN loss part.
 Highlight the significance of maintaining a balanced weight between ${L}_{\mathrm{pred}}$ and ${L}_{\mathrm{info}}$.
5. Conclusions and Future Research
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Major Notations  
Trajectory  
N  number of pedestrians 
$t{r}_{obs}^{n}$  observed trajectory for the ${n}^{th}$ pedestrian 
$t{r}_{pred}^{n}$  future groundtruth trajectory for the ${n}^{th}$ pedestrian 
${T}_{obs}$  length of observed trajectories 
${T}_{pred}$  length of predicted trajectories 
$({\mathit{x}}_{t}^{n},{\mathit{y}}_{t}^{n})$  random variables describing the location of the ${n}^{th}$ pedestrian at time step t 
$({\widehat{\mathit{x}}}_{t}^{n},{\widehat{\mathit{y}}}_{t}^{n})$  predicted location of the ${n}^{th}$ pedestrian at time step t 
SpatialTemporal Graph  
${\mathcal{G}}_{t}$  spatial graph at step t 
${\mathcal{G}}_{1:T}$  spatialtemporal graph 
${V}_{t}$  set of vertices for ${\mathcal{G}}_{t}$ 
${E}_{t}$  set of edges for ${\mathcal{G}}_{t}$ 
${A}_{t}$  adjacency matrix for ${\mathcal{G}}_{t}$ 
I  identity matrix 
Variational Mutual Information Maximization  
G  generator 
D  discriminator 
${P}_{\theta}({c}^{n}\mid t{r}_{obs}^{n})$  conditional prior distribution for the latent code ${c}^{n}$ 
$P({c}^{n}\mid t{r}_{obs}^{n},t{r}_{pred}^{n})$  posterior distribution for the latent code ${c}^{n}$ 
${Q}_{\varphi}({c}^{n}\mid t{r}_{obs}^{n},t{r}_{pred}^{n})$  approximate posterior distribution for ${c}^{n}$ 
SpatialTemporal Graph Convolution  
${\mathrm{feature}}^{\left(l\right)}$  feature map at layer l 
${\mathrm{feature}}^{(l+1)}$  feature map at layer $l+1$ 
$\mathbf{p}(\xb7)$  sampling function 
${\mathbf{w}}^{\left(l\right)}$  weight function at layer l 
SpatialTemporal Graph Attention  
Qry  query of the attention mechanism 
Key  key of the attention mechanism 
Val  value of the attention mechanism 
References
 Hashimoto, Y.; Gu, Y.; Hsu, L.T.; IryoAsano, M.; Kamijo, S. A probabilistic model of pedestrian crossing behavior at signalized intersections for connected vehicles. Transp. Res. Part C Emerg. Technol. 2016, 71, 164–181. [Google Scholar] [CrossRef]
 Haghani, M. Empirical methods in pedestrian, crowd and evacuation dynamics: Part I. Experimental methods and emerging topics. Saf. Sci. 2020, 129, 104743. [Google Scholar] [CrossRef]
 Bahari, M.; Nejjar, I.; Alahi, A. Injecting knowledge in datadriven vehicle trajectory predictors. Transp. Res. Part C Emerg. Technol. 2021, 128, 103010. [Google Scholar] [CrossRef]
 Kalatian, A.; Farooq, B. A contextaware pedestrian trajectory prediction framework for automated vehicles. Transp. Res. Part C Emerg. Technol. 2022, 134, 103453. [Google Scholar] [CrossRef]
 BautistaMontesano, R.; Galluzzi, R.; Ruan, K.; Fu, Y.; Di, X. Autonomous navigation at unsignalized intersections: A coupled reinforcement learning and model predictive control approach. Transp. Res. Part C Emerg. Technol. 2022, 139, 103662. [Google Scholar] [CrossRef]
 Mo, Z.; Li, W.; Fu, Y.; Ruan, K.; Di, X. CVLight: Decentralized learning for adaptive traffic signal control with connected vehicles. Transp. Res. Part C Emerg. Technol. 2022, 141, 103728. [Google Scholar] [CrossRef]
 Wang, Z.; Sun, P.; Hu, Y.; Boukerche, A. A novel mixed method of machine learning based models in vehicular traffic flow prediction. In Proceedings of the 25th International ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems, Montreal, QC, Canada, 24–28 October 2022; ACM: New York, NY, USA, 2022; pp. 95–101. [Google Scholar]
 Fu, Y.; Di, X. Federated Reinforcement Learning for Adaptive Traffic Signal Control: A Case Study in New York City. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), New York, NY, USA, 24–28 September 2023; IEEE: New York, NY, USA, 2023; pp. 5738–5743. [Google Scholar]
 Musleh, B.; García, F.; Otamendi, J.; Armingol, J.M.; De la Escalera, A. Identifying and tracking pedestrians based on sensor fusion and motion stability predictions. Sensors 2010, 10, 8028–8053. [Google Scholar] [CrossRef] [PubMed]
 Zangenehpour, S.; MirandaMoreno, L.F.; Saunier, N. Automated classification based on video data at intersections with heavy pedestrian and bicycle traffic: Methodology and application. Transp. Res. Part C Emerg. Technol. 2015, 56, 161–176. [Google Scholar] [CrossRef]
 StAubin, P.; Saunier, N.; MirandaMoreno, L. Largescale automated proactive road safety analysis using video data. Transp. Res. Part C Emerg. Technol. 2015, 58, 363–379. [Google Scholar] [CrossRef]
 Errico, F.; Crainic, T.G.; Malucelli, F.; Nonato, M. A survey on planning semiflexible transit systems: Methodological issues and a unifying framework. Transp. Res. Part C Emerg. Technol. 2013, 36, 324–338. [Google Scholar] [CrossRef]
 Grahn, R.; Qian, S.; Hendrickson, C. Improving the performance of firstand lastmile mobility services through transit coordination, realtime demand prediction, advanced reservations, and trip prioritization. Transp. Res. Part C Emerg. Technol. 2021, 133, 103430. [Google Scholar] [CrossRef]
 Ma, X.; Karimpour, A.; Wu, Y.J. Datadriven transfer learning framework for estimating onramp and offramp traffic flows. J. Intell. Transp. Syst. 2024, 1–14. Available online: https://www.tandfonline.com/doi/full/10.1080/15472450.2023.2301696 (accessed on 1 April 2024).
 Li, T.; Klavins, J.; Xu, T.; Zafri, N.M.; Stern, R. Understanding driverpedestrian interactions to predict driver yielding: Naturalistic opensource dataset collected in Minnesota. arXiv 2023, arXiv:2312.15113. [Google Scholar]
 Yang, H.F.; Ling, Y.; Kopca, C.; Ricord, S.; Wang, Y. Cooperative traffic signal assistance system for nonmotorized users and disabilities empowered by computer vision and edge artificial intelligence. Transp. Res. Part C Emerg. Technol. 2022, 145, 103896. [Google Scholar] [CrossRef]
 Moussaïd, M.; Perozo, N.; Garnier, S.; Helbing, D.; Theraulaz, G. The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PLoS ONE 2010, 5, e10047. [Google Scholar] [CrossRef] [PubMed]
 Helbing, D.; Molnar, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282. [Google Scholar] [CrossRef] [PubMed]
 Hoogendoorn, S.P.; Bovy, P.H. Pedestrian routechoice and activity scheduling theory and models. Transp. Res. Part B Methodol. 2004, 38, 169–190. [Google Scholar] [CrossRef]
 Antonini, G.; Bierlaire, M.; Weber, M. Discrete choice models of pedestrian walking behavior. Transp. Res. Part B Methodol. 2006, 40, 667–687. [Google Scholar] [CrossRef]
 Haghani, M.; Sarvi, M. Crowd behaviour and motion: Empirical methods. Transp. Res. Part B Methodol. 2018, 107, 253–294. [Google Scholar] [CrossRef]
 Ruan, K.; Di, X. Learning human driving behaviors with sequential causal imitation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, pp. 4583–4592. [Google Scholar]
 Ruan, K.; Zhang, J.; Di, X.; Bareinboim, E. Causal Imitation for Markov Decision Processes: A Partial Identification Approach. Technical Report R104 (causalai.net/r104.pdf), Causal Artificial Intelligence Lab, Columbia University. May 2024. Available online: https://causalai.net/r104.pdf (accessed on 1 April 2024).
 Knoblauch, R.L.; Pietrucha, M.T.; Nitzburg, M. Field studies of pedestrian walking speed and startup time. Transp. Res. Rec. 1996, 1538, 27–38. [Google Scholar] [CrossRef]
 Do, T.; Haghani, M.; Sarvi, M. Group and single pedestrian behavior in crowd dynamics. Transp. Res. Rec. 2016, 2540, 13–19. [Google Scholar] [CrossRef]
 Ruan, K.; Zhang, J.; Di, X.; Bareinboim, E. Causal Imitation Learning via Inverse Reinforcement Learning. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
 Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; FeiFei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 961–971. [Google Scholar]
 Liang, J.; Jiang, L.; Niebles, J.C.; Hauptmann, A.G.; FeiFei, L. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5725–5734. [Google Scholar]
 Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. Srlstm: State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 12085–12094. [Google Scholar]
 Goodfellow, I.; PougetAbadie, J.; Mirza, M.; Xu, B.; WardeFarley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. Acm 2020, 63, 139–144. [Google Scholar] [CrossRef]
 Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
 Li, T.; Shang, M.; Wang, S.; Filippelli, M.; Stern, R. Detecting stealthy cyberattacks on automated vehicles via generative adversarial networks. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; IEEE: New York, NY, USA, 2022; pp. 3632–3637. [Google Scholar]
 Mo, Z.; Fu, Y.; Xu, D.; Di, X. Trafficflowgan: Physicsinformed flow based generative adversarial network for uncertainty quantification. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 323–339. [Google Scholar]
 Gupta, A.; Johnson, J.; FeiFei, L.; Savarese, S.; Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2255–2264. [Google Scholar]
 Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1349–1358. [Google Scholar]
 Duives, D.C.; Daamen, W.; Hoogendoorn, S.P. Stateoftheart crowd motion simulation models. Transp. Res. Part C Emerg. Technol. 2013, 37, 193–209. [Google Scholar] [CrossRef]
 Tordeux, A.; Lämmel, G.; Hänseler, F.S.; Steffen, B. A mesoscopic model for largescale simulation of pedestrian dynamics. Transp. Res. Part C Emerg. Technol. 2018, 93, 128–147. [Google Scholar] [CrossRef]
 Chraibi, M.; Tordeux, A.; Schadschneider, A.; Seyfried, A. Modelling of pedestrian and evacuation dynamics. In Encyclopedia of Complexity and Systems Science; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–22. [Google Scholar]
 Hoogendoorn, S.P.; Daamen, W.; Knoop, V.L.; Steenbakkers, J.; Sarvi, M. Macroscopic fundamental diagram for pedestrian networks: Theory and applications. Transp. Res. Part C Emerg. Technol. 2018, 94, 172–184. [Google Scholar] [CrossRef]
 Yuan, Y.; GoñiRos, B.; Bui, H.H.; Daamen, W.; Vu, H.L.; Hoogendoorn, S.P. Macroscopic pedestrian flow simulation using Smoothed Particle Hydrodynamics (SPH). Transp. Res. Part C Emerg. Technol. 2020, 111, 334–351. [Google Scholar] [CrossRef]
 Blue, V.J.; Adler, J.L. Emergent fundamental pedestrian flows from cellular automata microsimulation. Transp. Res. Rec. 1998, 1644, 29–36. [Google Scholar] [CrossRef]
 Burstedde, C.; Klauck, K.; Schadschneider, A.; Zittartz, J. Simulation of pedestrian dynamics using a twodimensional cellular automaton. Phys. A Stat. Mech. Its Appl. 2001, 295, 507–525. [Google Scholar] [CrossRef]
 Zeng, W.; Chen, P.; Nakamura, H.; IryoAsano, M. Application of social force model to pedestrian behavior analysis at signalized crosswalk. Transp. Res. Part C Emerg. Technol. 2014, 40, 143–159. [Google Scholar] [CrossRef]
 Fiorini, P.; Shiller, Z. Motion planning in dynamic environments using velocity obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
 Van den Berg, J.; Lin, M.; Manocha, D. Reciprocal velocity obstacles for realtime multiagent navigation. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; IEEE: New York, NY, USA, 2008; pp. 1928–1935. [Google Scholar]
 Guy, S.J.; Lin, M.C.; Manocha, D. Modeling collision avoidance behavior for virtual humans. In Proceedings of the 9th International Joint Conference on Autonomous Agents and Multiagent Systems 2010, AAMAS, Toronto, ON, Canada, 10 May 2010; Volume 2010, pp. 575–582. [Google Scholar]
 Karamouzas, I.; Overmars, M. A velocitybased approach for simulating human collision avoidance. In Proceedings of the Intelligent Virtual Agents: 10th International Conference, IVA 2010, Philadelphia, PA, USA, 20–22 September 2010; Proceedings 10; Springer: Berlin/Heidelberg, Germany, 2010; pp. 180–186. [Google Scholar]
 Van Den Berg, J.; Guy, S.J.; Lin, M.; Manocha, D. Reciprocal nbody collision avoidance. In Proceedings of the Robotics Research: The 14th International Symposium ISRR, Lucerne, Switzerland, 31 August–1 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 3–19. [Google Scholar]
 LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
 Lan, G.; Wang, H.; Anderson, J.; Brinton, C.; Aggarwal, V. Improved Communication Efficiency in Federated Natural Policy Gradient via ADMMbased Gradient Updates. arXiv 2023, arXiv:2310.19807. [Google Scholar]
 Wang, Z.; Zhuang, D.; Li, Y.; Zhao, J.; Sun, P.; Wang, S.; Hu, Y. STGIN: An uncertainty quantification approach in traffic data imputation with spatiotemporal graph attention and bidirectional recurrent united neural networks. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 28 September–1 October 2023; IEEE: New York, NY, USA, 2023; pp. 1454–1459. [Google Scholar]
 Che, L.; Wang, J.; Zhou, Y.; Ma, F. Multimodal federated learning: A survey. Sensors 2023, 23, 6986. [Google Scholar] [CrossRef]
 Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
 Saadatnejad, S.; Bahari, M.; Khorsandi, P.; Saneian, M.; MoosaviDezfooli, S.M.; Alahi, A. Are sociallyaware trajectory prediction models really sociallyaware? Transp. Res. Part C Emerg. Technol. 2022, 141, 103705. [Google Scholar] [CrossRef]
 Kosaraju, V.; Sadeghian, A.; MartínMartín, R.; Reid, I.; Rezatofighi, H.; Savarese, S. Socialbigat: Multimodal trajectory forecasting using bicyclegan and graph attention networks. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/d09bf41544a3365a46c9077ebb5e35c3Paper.pdf (accessed on 1 April 2024).
 Sun, J.; Jiang, Q.; Lu, C. Recursive social behavior graph for trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 660–669. [Google Scholar]
 Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Socialstgcnn: A social spatiotemporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 27 February 2020; pp. 14424–14432. [Google Scholar]
 Sekhon, J.; Fleming, C. SCAN: A Spatial Context Attentive Network for Joint MultiAgent Intent Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 6119–6127. [Google Scholar]
 Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
 Kipf, T.N.; Welling, M. Semisupervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
 Yu, Z.; Gao, H. Molecular representation learning via heterogeneous motif graph neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 25581–25594. [Google Scholar]
 Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/04df4d434d481c5bb723be1b6df1ee65Paper.pdf (accessed on 1 April 2024).
 Hamilton, W.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs; Advances in neural information processing systems; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2017; Volume 30. [Google Scholar]
 Zhang, M.; Cui, Z.; Neumann, M.; Chen, Y. An endtoend deep learning architecture for graph classification. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. No. 1. [Google Scholar]
 Zhuang, J.; Al Hasan, M. Robust node classification on graphs: Jointly from Bayesian label transition and topologybased label propagation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2795–2805. [Google Scholar]
 Dong, X.; Wong, R.; Lyu, W.; AbellHart, K.; Deng, J.; Liu, Y.; Hajagos, J.G.; Rosenthal, R.N.; Chen, C.; Wang, F. An integrated LSTMHeteroRGNN model for interpretable opioid overdose risk prediction. Artif. Intell. Med. 2023, 135, 102439. [Google Scholar] [CrossRef]
 Yu, Z.; Gao, H. Motifexplainer: A motifbased graph neural network explainer. arXiv 2022, arXiv:2202.00519. [Google Scholar]
 Guo, K.; Hu, Y.; Qian, Z.; Liu, H.; Zhang, K.; Sun, Y.; Gao, J.; Yin, B. Optimized graph convolution recurrent neural network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1138–1149. [Google Scholar] [CrossRef]
 Wu, K.; Zhou, Y.; Shi, H.; Li, X.; Ran, B. GraphBased InteractionAware Multimodal 2D Vehicle Trajectory Prediction Using Diffusion Graph Convolutional Networks. IEEE Trans. Intell. Veh. 2023, 9, 3630–3643. [Google Scholar] [CrossRef]
 Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeletonbased action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32, pp. 7444–7452. [Google Scholar]
 Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aaAbstract.html (accessed on 1 April 2024).
 Ruan, K.; He, X.; Wang, J.; Zhou, X.; Feng, H.; Kebarighotbi, A. S2e: Towards an endtoend entity resolution solution from acoustic signal. In Proceedings of the ICASSP 20242024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: New York, NY, USA, 2024; pp. 10441–10445. [Google Scholar]
 Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
 Liu, Z.; Chen, C.; Li, L.; Zhou, J.; Li, X.; Song, L.; Qi, Y. Geniepath: Graph neural networks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4424–4431. [Google Scholar]
 Zhuang, J.; Al Hasan, M. Defending graph convolutional networks against dynamic graph perturbations via bayesian selfsupervision. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; Volume 36, pp. 4405–4413. [Google Scholar]
 Wang, H.; Lian, D.; Ge, Y. Binarized collaborative filtering with distilling graph convolutional networks. arXiv 2019, arXiv:1906.01829. [Google Scholar]
 Dong, J.; Chen, S.; Ha, P.Y.J.; Li, Y.; Labi, S. A DRLbased multiagent cooperative control framework for CAV networks: A graphic convolution Q network. arXiv 2020, arXiv:2010.05437. [Google Scholar]
 Lyu, W.; Dong, X.; Wong, R.; Zheng, S.; AbellHart, K.; Wang, F.; Chen, C. A multimodal transformer: Fusing clinical notes with structured EHR data for interpretable inhospital mortality prediction. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 5–9 November 2022; American Medical Informatics Association: Bethesda, MD, USA; Volume 2022, p. 719. [Google Scholar]
 Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attentionbased neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
 Lin, F.; Crawford, S.; Guillot, K.; Zhang, Y.; Chen, Y.; Yuan, X.; Chen, L.; Williams, S.; Minvielle, R.; Xiao, X.; et al. MMSTViT: Climate Changeaware Crop Yield Prediction via MultiModal SpatialTemporal Vision Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 5774–5784. [Google Scholar]
 Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatiotemporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XII 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 507–523. [Google Scholar]
 Kullback, S. Information Theory and Statistics; Courier Corporation: North Chelmsford, MA, USA, 1997. [Google Scholar]
 MacKay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
 Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: Berlin/Heidelberg, Germany, 2004; Volume 26. [Google Scholar]
 Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://www.semanticscholar.org/paper/InfoGAN%3AInterpretableRepresentationLearningbyChenDuan/eb7ee0bc355652654990bcf9f92f124688fde493 (accessed on 1 April 2024).
 Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
 Lin, F.; Yuan, X.; Peng, L.; Tzeng, N.F. Cascade variational autoencoder for hierarchical disentanglement. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 1248–1257. [Google Scholar]
 Hjelm, R.D.; Fedorov, A.; LavoieMarchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. arXiv 2018, arXiv:1808.06670. [Google Scholar]
 Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with gumbelsoftmax. arXiv 2016, arXiv:1611.01144. [Google Scholar]
 Maddison, C.J.; Mnih, A.; Teh, Y.W. The concrete distribution: A continuous relaxation of discrete random variables. arXiv 2016, arXiv:1611.00712. [Google Scholar]
 Pellegrini, S.; Ess, A.; Schindler, K.; Van Gool, L. You’ll never walk alone: Modeling social behavior for multitarget tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer VISION, Kyoto, Japan, 28 September–2 October 2009; IEEE: New York, NY, USA, 2009; pp. 261–268. [Google Scholar]
 Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. In Proceedings of the Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2007; Volume 26, pp. 655–664. [Google Scholar]
 Ma, X. Traffic Performance Evaluation Using Statistical and Machine Learning Methods. Ph.D. Thesis, The University of Arizona, Tucson, AZ, USA, 2022. [Google Scholar]
 He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
 De Maesschalck, R.; JouanRimbaud, D.; Massart, D.L. The mahalanobis distance. Chemom. Intell. Lab. Syst. 2000, 50, 1–18. [Google Scholar] [CrossRef]
 Mo, Z.; Fu, Y.; Di, X. PINeuGODE: PhysicsInformed Graph Neural Ordinary Differential Equations for Spatiotemporal Trajectory Prediction. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 6–10 May 2024; pp. 1418–1426. [Google Scholar]
 Camara, F.; Merat, N.; Fox, C.W. A heuristic model for pedestrian intention estimation. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; IEEE: New York, NY, USA; pp. 3708–3713. [Google Scholar]
 Akopov, A.S.; Beklaryan, L.A.; Beklaryan, A.L. Clusterbased optimization of an evacuation process using a parallel biobjective realcoded genetic algorithm. Cybern. Inf. Technol. 2020, 20, 45–63. [Google Scholar] [CrossRef]
References  Method  Required Features  Probabilistic or Deterministic  Social Interactions Modeling  Heterogeneity Modeling  

Physicsbased  [18]  Social Force Model  Positions + Velocities + Destinations  Deterministic  Social force fields  Different characteristics to different agents 
[41]  Cellular Automaton Model  Pedestrians (Velocities, Density, …) + Cells + Rules  Probabilistic  Predefined rules  Multiple walking classes  
Deep learningbased  [27]  Social LSTM (SLSTM)  Positions  Deterministic  Social pooling  $$ 
[34]  Social GAN (SGANPooling)  Positions  Probabilistic  MaxPooling  Variety loss  
[55]  Graph Attention Network (GAT)  Positions + Images  Probabilistic  Social Attention  $$  
[35]  LSTMbased Generative Adversarial Network (SoPhie)  Positions + Images  Probabilistic  Social attention  $$  
[57]  Social SpatioTemporal Graph Convolutional Neural Network (SocialSTGCNN)  Positions  Probabilistic  Social kernels  $$  
[58]  Spatial Context Attentive Network (SCAN)  Positions  Probabilistic  Spatial attention mechanism  $$  
This study  InfoSTGCAN  Positions  Probabilistic  Social kernels + social attention  Pedestrianlevel latent codes 
Algorithm  Performance (ADE/FDE)  

ETH  HOTEL  UNIV  ZARA1  ZARA2  AVG  
Linear  1.33/2.94  0.39/0.72  0.82/1.59  0.62/1.21  0.77/1.48  0.79/1.59 
SLSTM  1.09/2.35  0.79/1.76  0.67/1.40  0.47/1.00  0.56/1.17  0.72/1.54 
SGANPooling  0.87/1.62  0.67/1.37  0.76/1.52  0.35/0.68  0.42/0.84  0.61/1.21 
SRLSTM2  0.63/1.25  0.37/0.74  0.51/1.10  0.41/0.90  0.32/0.70  0.45/0.94 
GAT  0.68/1.29  0.68/1.40  0.57/1.29  0.29/0.60  0.37/0.75  0.52/1.07 
Sophie  0.70/1.43  0.76/1.67  0.54/1.24  0.30/0.63  0.38/0.78  0.54/1.15 
SCAN  0.84/1.58  0.44/0.90  0.63/1.33  0.31/0.85  0.37/0.76  0.51/1.08 
SocialSTGCNN  0.64/1.11  0.49/0.85  0.44/0.79  0.34/0.53  0.30/0.48  0.44/0.75 
InfoSTGCAN  0.61/0.82  0.48/0.71  0.40/0.64  0.33/0.51  0.30/0.44  0.42/0.62 
InfoSTGCAN  Performance (ADE/FDE) 

${\lambda}_{1}=1.0,{\lambda}_{2}=1.0,{\lambda}_{3}=0.06$  0.48/0.71 
${\lambda}_{1}=1.0,{\lambda}_{2}=0.0,{\lambda}_{3}=0.06$  1.11/1.90 
${\lambda}_{1}=1.0,{\lambda}_{2}=1.0,{\lambda}_{3}=1.0$  0.57/0.92 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruan, K.; Di, X. InfoSTGCAN: An InformationMaximizing SpatialTemporal Graph Convolutional Attention Network for Heterogeneous Human Trajectory Prediction. Computers 2024, 13, 151. https://doi.org/10.3390/computers13060151
Ruan K, Di X. InfoSTGCAN: An InformationMaximizing SpatialTemporal Graph Convolutional Attention Network for Heterogeneous Human Trajectory Prediction. Computers. 2024; 13(6):151. https://doi.org/10.3390/computers13060151
Chicago/Turabian StyleRuan, Kangrui, and Xuan Di. 2024. "InfoSTGCAN: An InformationMaximizing SpatialTemporal Graph Convolutional Attention Network for Heterogeneous Human Trajectory Prediction" Computers 13, no. 6: 151. https://doi.org/10.3390/computers13060151