Next Article in Journal
Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling
Previous Article in Journal
Spatiotemporal Imbalances in Dockless Bike-Sharing Usage: Evidence from Shanghai
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network

School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan 430070, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(1), 42; https://doi.org/10.3390/ijgi15010042
Submission received: 24 November 2025 / Revised: 9 January 2026 / Accepted: 12 January 2026 / Published: 15 January 2026

Abstract

Pedestrian trajectory prediction plays a vital role in autonomous driving and intelligent surveillance systems. Graph neural networks (GNNs) have shown remarkable effectiveness in this task by explicitly modeling social interactions among pedestrians. However, existing methods suffer from two key limitations. First, they face difficulty in balancing the reduction in redundant connections with the preservation of critical interaction relationships in spatial graph construction. Second, higher-order graph convolution methods lack adaptability to varying crowd densities. To address these limitations, we propose a pedestrian trajectory prediction method based on Delaunay triangulation and density-adaptive higher-order graph convolution. First, we leverage Delaunay triangulation to construct a sparse, geometrically principled adjacency structure for spatial interaction graphs, which effectively eliminates redundant connections while preserving essential proximity relationships. Second, we design a density-adaptive order selection mechanism that dynamically adjusts the graph convolution order according to pedestrian density. Experiments on the ETH/UCY datasets show that our method achieves 5.6% and 9.4% reductions in average displacement error (ADE) and final displacement error (FDE), respectively, compared with the recent graph convolution-based method DSTIGCN, demonstrating the effectiveness of the proposed approach.

1. Introduction

1.1. Research Background

Autonomous navigation of intelligent mobile robots in outdoor environments represents a critical research area in artificial intelligence, with applications spanning campus delivery, autonomous driving, and intelligent security systems [1]. In these dynamic outdoor environments, pedestrians are the primary moving agents, and accurate prediction of their trajectories is crucial for effective path planning and safe human–robot interaction. Compared to indoor environments, outdoor settings feature larger spatial scales, greater variations in pedestrian density and more complex interaction patterns, making trajectory prediction significantly more challenging [2].
Accurate and efficient pedestrian trajectory prediction is essential for intelligent mobile robot systems [3]. In terms of prediction accuracy, precise trajectory prediction enables early identification of potential human–robot conflict risks, providing reliable input for path planning algorithms. Regarding computational efficiency, real-time performance is a core requirement to meet the rapid response needs of “perception–prediction–decision” processes for autonomous vehicles, delivery robots, and other intelligent agents. Therefore, balancing computational efficiency with prediction accuracy remains a critical challenge [4].
However, the complexity of human movement behavior poses numerous challenges for pedestrian trajectory prediction. Pedestrian movement decisions are influenced not only by individual intentions and goals but also by social interactions with surrounding pedestrians [5]. These interaction relationships exhibit multi-level and multi-scale characteristics. While the direct influence of nearby pedestrians is significant, the indirect influence of distant pedestrians through intermediary agents is equally important. Moreover, accurate modeling of these complex social interaction relationships typically incurs additional computational overhead, which conflicts with real-time requirements. Therefore, achieving accurate modeling of complex social interactions within real-time constraints is the key to improving trajectory prediction accuracy.

1.2. Literature Review

The development of pedestrian trajectory prediction has undergone an evolution from traditional physical models to modern deep learning methods, with each paradigm offering distinct modeling perspectives and performance characteristics.

1.2.1. Physical Model-Based Methods

Early approaches to pedestrian trajectory prediction relied primarily on physics-based models that describe pedestrian movement through mathematical equations. These methods encompass kinematic models and social force models.
Kinematic models predict future trajectories using kinematic equations based on historical motion states (e.g., position, velocity, and acceleration). Representative examples include the constant velocity (CV) model [6] and constant acceleration (CA) model [7]. These models offer low computational complexity and excellent real-time performance, making them suitable for resource-constrained applications. However, kinematic models consider only individual motion states while neglecting social interactions between pedestrians, resulting in limited accuracy in complex multi-agent scenarios.
In contrast, social force models consider interaction relationships between pedestrians. The Social Force Model (SFM) proposed by Helbing et al. [8] is a representative method that models pedestrian motion as the superposition of multiple “forces,” including goal attraction force, pedestrian repulsion force, and obstacle repulsion force. Through the synthesis of these forces, the model can calculate pedestrian motion direction and velocity. Social force models offer high interpretability, providing intuitive insights into the physical mechanisms underlying pedestrian motion. However, these models struggle to capture diverse behavioral patterns in complex scenarios, resulting in limited prediction accuracy for real-world applications.

1.2.2. Deep Learning-Based Methods

In recent years, deep learning methods have significantly advanced pedestrian trajectory prediction by learning complex movement patterns from data. These data-driven approaches have become the dominant paradigm in this field.
Early deep learning trajectory prediction methods were primarily based on Recurrent Neural Networks (RNNs) and their variants, such as Long Short-Term Memory network (LSTM) [9,10] and Gated Recurrent Unit (GRU) [11]. These methods focus on temporal sequence modeling, but they inadequately consider spatial interaction relationships among pedestrians. To address this limitation, Alahi et al. [12] proposed Social-LSTM, which introduces a social pooling mechanism that models social interactions by pooling hidden states of neighboring pedestrians within predefined grid regions. Gupta et al. [13] further proposed Social-GAN, which combines generative adversarial network frameworks to enhance the model’s ability to generate diverse trajectories. However, the social pooling mechanism uses predefined grid regions to aggregate neighbor information, failing to distinguish the varying influences of different neighboring pedestrians on target agents. Moreover, fixed grid partitioning lacks adaptability to dynamically changing interaction relationships and varying spatial scales across scenarios.
Inspired by the success of Transformers in natural language processing [14], researchers have adapted this architecture for trajectory prediction. Giuliari et al. [15] proposed a Transformer-based trajectory prediction framework that captures spatio-temporal dependencies through self-attention mechanisms. While this approach achieves strong performance for deterministic trajectory prediction, its autoregressive decoding can suffer from error accumulation over long prediction horizons. Yuan et al. [16] proposed AgentFormer, which designs an agent-aware attention mechanism to model multi-agent interactions. These methods flexibly model dependencies between arbitrary spatio-temporal positions. However, their computational complexity increases significantly with both sequence length and the number of agents, posing efficiency challenges for crowded, long-horizon prediction scenarios.
Graph Neural Networks (GNNs) can naturally represent and process non-Euclidean structured data, and graph convolution operations possess efficient parallel computing characteristics. These properties enable GNNs to achieve superior computational efficiency in real-time multi-pedestrian prediction scenarios, making them effective tools for modeling many-to-many interaction relationships in pedestrian trajectory prediction. Mohamed et al. [17] introduced Social-STGCNN, pioneering the application of graph convolutional networks to pedestrian trajectory prediction. This approach represents each pedestrian as a graph node, encodes historical trajectory information as node features, and models potential interaction relationships as graph edges. Based on this graph representation, the model aggregates neighborhood information through graph convolution operations, learning pedestrian motion representations that incorporate social interaction information. The core of GNN-based methods lies in graph construction and feature extraction through graph convolutions.
Existing graph construction methods primarily fall into two categories: fully connected graphs and sparse graphs. Fully connected graphs connect all pedestrian pairs, with edge weights determined by manual settings or learned through attention mechanisms. Social-STGCNN [17] assumes that interaction strength is inversely proportional to inter-pedestrian distance, setting edge weights as the reciprocal of relative distances. Huang et al. [18] proposed STGAT, which employs graph attention mechanisms to adaptively learn interaction importance weights. As shown in Figure 1a, while fully connected graphs can theoretically capture all potential interactions, they incur high computational costs that hinder real-time processing in crowded scenarios. Moreover, excessive connections may introduce noise and degrade generalization performance. In contrast, sparse graph methods reduce computational complexity by decreasing the number of graph edges. Shi et al. [19] proposed SGCN, which employs threshold-based pruning to filter redundant connections, partially addressing the computational burden of fully connected graphs. However, the fixed threshold setting in this method exhibits limited adaptability to scenarios with varying pedestrian densities. Sang et al. [20] proposed RDGCN, a reasonably dense graph convolution network. It uses the RSigmoid function for rational interaction weight reallocation and asymmetric 3D convolutions for spatio-temporal fusion. In recent years, some researchers have explored the integration of physical constraints and prior knowledge into trajectory prediction models [21]. For example, Chen et al. [22] proposed IMGCN, an interpretable masked graph convolutional network. This method establishes connections only for pedestrian pairs simultaneously satisfying two criteria: residing within a 10° field-of-view sector and maintaining Euclidean distances below 5 m, thereby filtering irrelevant interaction relationships. These methods demonstrate strong interpretability based on human visual perception prior knowledge and effectively reduce the number of graph edges by filtering obviously irrelevant distant interactions, lowering computational complexity. However, such fixed physical constraints exhibit inherent limitations. On one hand, the field-of-view sector and distance thresholds rely on prior knowledge and empirical settings, limiting adaptability across diverse scenarios. On the other hand, this method may miss important interaction relationships. As shown in Figure 1b, when two pedestrians move in parallel, they continuously coordinate trajectories to maintain parallel states and avoid collisions despite being outside each other’s field of view, exhibiting strong implicit interaction. However, field-of-view constraint-based methods fail to capture such important interaction relationships.
Based on the graph topological structure constructed above, graph convolution achieves spatial feature extraction through message passing and neighborhood aggregation mechanisms. Traditional graph neural networks primarily model direct interactions between adjacent nodes, capturing only one-hop interaction relationships. However, in real scenarios, pedestrian motion is not only influenced by direct neighbors but also by indirect neighbors. As shown in Figure 2, the red pedestrian is the current target pedestrian, and the blue pedestrians are neighbors of the target pedestrian. Figure 2a depicts the one-hop neighborhood, comprising only directly connected individuals. Figure 2b,c extend to two-hop and three-hop neighborhoods, respectively, capturing increasingly indirect influences transmitted through intermediate nodes. As the hop count increases, the target pedestrian’s receptive field expands, encompassing richer multi-level interaction patterns.
To capture indirect interaction relationships among pedestrians, researchers have explored higher-order graph convolution methods. The core idea is to expand the neighborhood range through the power operations of the adjacency matrix. Specifically, the first-order adjacency matrix A1 captures 1-hop neighbors, the second-order adjacency matrix A2 = A1 × A1 captures 2-hop neighbors, and the k-th order adjacency matrix Ak captures k-hop neighbors. By fusing features of different orders, the model can simultaneously capture direct interactions and multi-level indirect interactions. In related research, Sami et al. [23] proposed MixHop, which learns higher-order relationships by mixing adjacency matrix powers of different orders. Kim et al. [24] introduced higher-order graph convolution concepts into trajectory prediction, proposing the HighGraph model that captures indirect influences among pedestrians through multiple power operations. However, HighGraph adopts fully connected graph construction strategies, resulting in substantial redundant connections. To address redundant interaction issues, Chen et al. [25] proposed PCHGCN, a physically constrained higher-order graph convolution method. By integrating physical prior constraints such as distance, field of view, and distance change rate, this method filters redundant information in higher-order interactions and optimizes feature effectiveness. However, this method adopts a fixed 3rd-order setting, lacking adaptive capability for different scenarios.
Recent research has begun to explore scene-adaptive mechanisms to improve trajectory prediction performance. For example, the DAMM model proposed by Wen et al. [26] achieves dynamic selection of neighbor interaction ranges under different density scenarios through multi-scale motif matrix fusion. This model adapts small-range interactions in high-density scenarios to reduce redundant computation and expand ranges in low-density scenarios to capture weak interactions, improving prediction accuracy on the nuScenes dataset. However, existing higher-order graph convolutional network methods still commonly adopt unified order settings, lacking similar adaptive capabilities. Unified order settings may not be suitable for all situations. For example, in dense scenarios, high-hop neighbors may introduce excessive noise, while in sparse scenarios, using only low-hop neighbors may fail to fully model complex social influences. This limitation indicates a need for methods that can dynamically select orders based on scene characteristics and effectively fuse multi-order features to fully leverage the advantages of higher-order graph convolution in trajectory prediction.

1.2.3. Literature Summary

The above research shows that pedestrian trajectory prediction methods have transitioned from physics-driven to data-driven approaches. While physics-based methods have low computational overhead, they struggle to capture the complexity and diversity of pedestrian behavior in real scenarios due to their reliance on prior assumptions and manually designed rules, resulting in limited prediction accuracy in crowded, complex interaction scenarios. Deep learning-based methods automatically learn complex pedestrian movement patterns and interaction modes through data-driven approaches, significantly improving prediction accuracy and generalization ability. Among them, RNN-based methods can capture temporal dependencies, and Transformer-based methods have strong long-term dependency modeling capabilities. However, RNNs’ sequential processing limits parallel computation efficiency. Meanwhile, Transformers face scalability challenges in large-scale multi-agent scenarios due to their computational complexity increasing substantially with both the number of agents and sequence length, posing efficiency bottlenecks for real-time applications. In contrast, GNN-based methods demonstrate unique advantages in pedestrian trajectory prediction tasks:
  • Explicit modeling of social interaction relationships. By representing pedestrians as graph nodes and interaction relationships as graph edges, graph neural networks can explicitly model many-to-many social interactions with enhanced interpretability.
  • Efficient parallel computation capability. Graph convolution operations possess inherent parallel computation characteristics, enabling simultaneous processing of feature aggregation for all pedestrians in the scene. This demonstrates superior computational efficiency in multi-pedestrian real-time prediction scenarios, meeting the real-time requirements of applications like autonomous driving.
However, existing GNN-based trajectory prediction methods face two critical challenges:
  • Balance issue in graph construction. Fully connected methods capture all potential interactions at the cost of substantial noise, while physics-constrained methods achieve computationally efficient but risk missing critical relationships. Therefore, maintaining important spatial proximity relationships while effectively avoiding redundant connections and constructing more reasonable graph structures is a core challenge for current GNN-based trajectory prediction methods.
  • Limited scene adaptability. While higher-order graph convolution methods effectively model indirect influences, existing approaches employ fixed-order configurations, lacking adaptive adjustment mechanisms for different scenarios. Dynamically adjusting convolution orders based on scene characteristics, optimizing computational efficiency while ensuring prediction accuracy, is a key issue in enhancing the practical utility of higher-order graph convolution methods.

1.3. Main Contributions

To address the insufficiencies in graph construction balance and scene-adaptive capabilities of existing GNN-based methods, we propose a pedestrian trajectory prediction method leveraging Delaunay triangulation and density-adaptive higher-order graph convolution. This approach effectively resolves the balance issue between redundant connections and missing important interactions in graph construction, as well as the lack of scene-adaptive capability in higher-order graph convolution, thereby enhancing model performance. Our main contributions are threefold:
  • Delaunay triangulation-based sparse graph construction. We utilize the geometric properties of Delaunay triangulation to construct a spatial graph structure that maintains spatial proximity relationships while avoiding redundant connections. This provides a more reasonable topological foundation for subsequent graph convolution operations, thereby improving prediction accuracy.
  • Density-adaptive higher-order graph convolution. We dynamically select the optimal graph convolution order based on local pedestrian density. The mechanism selects low-order convolution in high-density scenarios to avoid negative effects of visual occlusion, and convolution in low-density scenarios to adequately capture indirect interaction relationships. This adaptive mechanism balances prediction accuracy and computational efficiency across varying scene densities.
  • Efficient computational optimization. We introduce a first-frame caching mechanism that computes density and optimal order only for the first frame of sequences, with subsequent frames reusing these values to avoid redundant computation. Additionally, a masked adaptive weight fusion module can dynamically fuse multi-order effective features, avoiding interference from invalid orders. This enhances feature representation capability, contributing to improved prediction accuracy, while ensuring the feature fusion process is efficient and redundancy-free.

1.4. Paper Structure

The remainder of this paper is organized as follows: Section 2 elaborates on the proposed pedestrian trajectory prediction method, including the overall model architecture, Delaunay triangulation-based graph construction, and density-adaptive higher-order graph convolution. Section 3 presents experimental details, along with the results of quantitative and qualitative analyses. Section 4 summarizes the main work and outlines future research directions.

2. Method

2.1. Problem Definition

Suppose there are N pedestrians in a scene, where n 1 , 2 , , N represents the pedestrian index. Each pedestrian’s spatial position in the scene maps to a set of coordinates ( x n t , y n t ) in the world coordinate system, where t { 1 , 2 , , T o b s } indicates the time step index during the observation period. This set of continuous coordinates represents the pedestrian’s observed trajectory. Given the trajectories of N pedestrians during time step 1 ~ T o b s , our objective is to predict their future positions from time steps T o b s + 1 to T p r e d , yielding the predicted trajectories P ^ = { p ^ n t = ( x ^ n t , y ^ n t ) } . Consistent with previous work [12,17,19,22,27], we assume that pedestrian trajectory positional information follows a bivariate Gaussian distribution. The predictor directly outputs distribution parameters p ^ n t F ( μ ^ x , n t , μ ^ y , n t , σ ^ x , n t , σ ^ y , n t , ρ ^ n t ) , where ( μ ^ x , n t , μ ^ y , n t ) are the means of target n ’s trajectory in the x and y directions at time t , represent the standard deviations of trajectory in the x and y directions, and ρ ^ n t is the correlation coefficient between trajectories in both directions.

2.2. Overall Model

Among existing GNN-based trajectory prediction methods, SGCN demonstrates strong prediction performance and features a dual-branch spatiotemporal decoupling architecture, which provides a solid foundation for targeted improvements. Therefore, based on SGCN’s dual-branch core architecture, we construct an improved model framework adapted for complex scenarios. As shown in Figure 3, SGCN comprises three main components: a spatial branch, a temporal branch, and a prediction branch. The spatial branch models social interactions among pedestrians, while the temporal branch captures individual pedestrian temporal dependency features. The prediction branch processes fused spatiotemporal features through Temporal Convolutional Network (TCN). It outputs bivariate Gaussian distribution parameters for future trajectories, providing probabilistic representations for trajectory prediction.
To address limitations of existing graph construction methods and the lack of scene-adaptive capability in graph convolution, we conduct targeted architectural improvements. While retaining SGCN’s temporal branch and TCN module, we redesign the spatial branch around the goal of accurate pedestrian social interaction modeling. Specifically, we remove the original fixed-threshold sparsification module and introduce a Delaunay triangulation-based sparse graph construction module to ensure connection rationality. Additionally, we reconstruct convolution units into density-adaptive higher-order structures to adapt to different scenarios. The reconstructed model’s overall architecture is shown in Figure 4.
The spatial branch first constructs a spatial graph and applies an attention mechanism to obtain a spatial attention score matrix, preliminarily representing pedestrian social interactions. Concurrently, historical trajectories are input into the Delaunay triangulation module, which constructs a geometrically reasonable spatial graph structure based on pedestrian positions, thereby generating a Delaunay triangulation mask matrix. The spatial graph, spatial attention score matrix, and Delaunay triangulation mask matrix are then passed to the higher-order graph module to capture multi-scale social interaction features. Subsequently, weighted fusion of multi-order social interaction features is achieved through the gated fusion module.
The temporal branch first constructs a temporal graph based on historical trajectory information and then obtains a temporal asymmetric attention score matrix, i.e., temporal adjacency matrix, via the attention mechanism. Next, this adjacency matrix is then used in graph convolution to extract temporal interaction features.
Finally, the social interaction features from the spatial branch and the motion dynamic features from the temporal branch are fused. The fused features are processed through TCN, whose output layer generates bivariate Gaussian distribution parameters for each future time step, enabling multimodal trajectory prediction.

2.3. Delaunay Triangulation-Based Graph Construction

The spatial graph at time t is defined as G S = ( V t , U t ) , where V t = { v n t | n = 1 , , N } and U t = { u i , j t | i , j = 1 , , N } represent nodes and edges of the spatial graph, respectively. Here, v n t = p n t p n t 1 represents the displacement of pedestrian n from time t 1 to time t , and u i , j t { 0 , 1 } indicates whether nodes are connected (1 for connected, 0 otherwise).
Edge weights in the spatial graph are calculated through a self-attention mechanism. The spatial attention weight matrix R s N × N is calculated as follows:
E S = ϕ G S , W E S
Q S = ϕ E S , W Q S
K S = ϕ E S , W K S
R S = Softmax ( Q S K S T d S )
where ϕ ( ) represents linear transformation, E S represents spatial graph embedding, Q S and K S represent queries and keys in the self-attention mechanism, W E S , W Q S , W K S represent weights of linear transformations, and d s is a scaling factor ensuring value stability.
As discussed in Section 1.2.3, existing graph construction methods face an issue in balancing the reduction in redundant connections with the preservation of critical interaction relationships. To address this issue, we introduce Delaunay triangulation from the computational geometry field to construct spatial interaction graphs.
Triangulation generates a finite planar point set P = { p 1 , p 2 , , p n } into a collection of non-overlapping triangles T = { t 1 , t 2 , , t m } , where triangle vertices belong to P and no two triangles share interior points. Delaunay triangulation is a special type of triangulation whose key characteristic is satisfying the empty circle property. The circumcircle interior of any triangle does not contain other points in the point set, as shown in Figure 5. The circumcircle associated with triangle t 1 in Figure 5a does not contain any points inside, thus belonging to Delaunay triangulation. In contrast, the circumcircle associated with triangle t 2 in Figure 5b contains point p 4 , thus not belonging to Delaunay triangulation.
In graph structure construction, Delaunay triangulation can maintain spatial proximity relationships while avoiding excessive connections, thereby constructing more reasonable graph topological structures. These characteristics make Delaunay triangulation a classic method in computational geometry, widely applied in fields such as geographic information systems [28], 3D model reconstruction [29], and finite element analysis [30]. In the machine learning field, particularly in graph neural network applications, Delaunay triangulation provides a theoretical foundation for constructing graph structures with geometrically meaningful properties. In crowd behavior analysis tasks, Delaunay triangulation has been used to approximate neighborhood interactions [31], which has been proven effective in quantifying crowd attributes.
Inspired by these insights, we employ Delaunay triangulation for pedestrian social interaction graph construction. Its empty circumcircle property ensures geometric optimality, maintaining important spatial adjacency relationships while avoiding excessive connections. In practical applications, when the number of pedestrians in a scene is less than 3, basic construction conditions for Delaunay triangulation cannot be satisfied. For such special cases, we adopt a fully connected strategy as an alternative solution to ensure algorithm robustness.
Based on the above analysis, we design a spatial interaction adjacency matrix construction algorithm using Delaunay triangulation, with specific processes detailed in Algorithm 1.
Then, the spatial attention score matrix R S and Delaunay triangulation mask matrix M D are element-wise multiplied to construct the spatial adjacency matrix:
A s p a t i a l = R S M D
where denotes the Hadamard product. This operation retains interaction weights learned by the attention mechanism while filtering redundant connections through the Delaunay mask.
The final constructed pedestrian interaction topological graph is shown in Figure 6, where nodes represent each pedestrian in the scene, and node features contain historical trajectory information of that pedestrian. Edges represent interaction relationships among pedestrians. Edge connections are calculated through Delaunay triangulation, indicating whether interactions exist, while edge weights are learned through attention mechanisms reflecting interaction intensity.
Our approach offers advantages over existing methods. Compared to fully connected methods, Delaunay-based construction significantly reduces edge density, lowering computational complexity for subsequent graph convolutions. Unlike physics-constrained methods that rely on hand-tuned distance or angle thresholds, our approach adaptively generates connections based on spatial configuration, naturally capturing proximity-based interactions without hyperparameter tuning.
Algorithm 1 Spatial Interaction Adjacency Matrix Construction Based on Delaunay Triangulation
Input: P —position set of N pedestrians at a certain time, P = { p 1 , p 2 , , p n }
Output :   M D —spatial interaction adjacency matrix
1:function BuildSpatialGraph ( P , N )
2:        # Initialize an N × N zero matrix as adjacency matrix, where M D [ i , j ] = 0 indicates initially no connection relationship between pedestrian i and pedestrian j
3:         M D = z e r o s ( N , N )
4:        # Handle special cases with insufficient pedestrian numbers, unable to form effective triangulation, using fully connected graph strategy
5:        if N < 3 then
6:                 M D = o n e s ( N , N )
7:                return M D
8:        else
9:                # Execute Delaunay triangulation algorithm on input pedestrian position point set P , generating triangle set D T
10:                 D T = D e l a u n a y T r i a n g u l a t i o n ( P )
11:                # Traverse all triangles generated by Delaunay triangulation, establishing connection relationships in the adjacency matrix for the three edges of each triangle
12:                 for   t r i a n g l e ( i , j , k )     D T do
13:                         M D [ i , j ] = M D [ j , i ] = 1
14:                         M D [ j , k ] = M D [ k , j ] = 1
15:                         M D [ i , k ] = M D [ k , i ] = 1
16:                end for
17:                # Add self-loop connections for each node, self-loops represent that pedestrians’ own state information is preserved during graph convolution
18:                 for   i = 1 to N do
19:                         M D [ i , i ] = 1
20:                end for
21:                 return   M D

2.4. Density-Adaptive Higher-Order Graph Convolution

Pedestrian density is a key factor affecting social interaction complexity. In sparse environments, pedestrians have wider fields of view and can perceive indirect influences from distant pedestrians, requiring higher-order modeling to capture these long-range interaction relationships. Conversely, in dense environments, pedestrians’ fields of view are constrained by surrounding individuals, primarily focusing on direct interactions with immediate neighbors. Excessively higher-order modeling not only fails to capture useful information but may also introduce noise interference, thereby degrading prediction accuracy.
To address the above issues, we propose a density-adaptive order selection mechanism. This mechanism dynamically adjusts graph convolution orders based on pedestrian density. Specifically, higher orders are used in low-density scenarios to capture distant interaction information, while lower orders are used in high-density scenarios to avoid noise interference. Through this adaptive strategy, the model can capture corresponding-order neighborhood interaction information in different density environments, improving computational efficiency while maintaining prediction accuracy. The process is illustrated in Figure 7.

2.4.1. Density Calculation

To implement density-adaptive order selection, we employ average local density as the scene density metric.
For pedestrian i at time t in a scene, local density is defined as the ratio of the number of neighboring pedestrians within radius r to the circular area:
ρ i t = 1 π r 2 j i II ( p i t p j t 2 r )
where II ( ) is an indicator function that equals 1 if the condition holds and 0 otherwise. According to proxemics theory [32], the personal space radius is approximately 1.2–1.5 m, within which pedestrians adjust their trajectories due to spatial awareness. Therefore, we set r = 1.5 m.
To obtain overall scene density characteristics, we calculate the average of all pedestrians’ local densities:
ρ ¯ t = 1 N i = 1 N ρ i t
where N is the total number of pedestrians in the scene.

2.4.2. Adaptive Maximum Order Selection

The order of graph convolution directly determines both the range and accuracy of social interaction modeling. First-order convolution ( o r d e r   = 1 ) focuses on capturing direct proximity relationships, while higher-order convolution ( o r d e r   2 ) models indirect interactions through multi-hop neighbors. However, fixed-order settings exhibit limited adaptability to dynamically changing pedestrian density scenarios. Based on the density-dependent characteristics of pedestrian interaction behavior and the representation properties of graph convolution, we conduct the following theoretical analysis:
  • In low-density scenarios, there are relatively few spatial constraints among individuals, pedestrians have relatively wide fields of view, enabling them to perceive and respond to social information from relatively distant locations. Third-order graph convolution can effectively capture distant indirect influences.
  • In medium-density scenarios, individuals need to focus on both direct neighbors and indirect influences transmitted by two-hop neighbors through intermediate nodes. Second-order graph convolution balances the modeling of direct and indirect interactions.
  • In high-density scenarios, visual occlusion significantly limits individuals’ attentional resources, focusing attention primarily on immediate neighbors’ behaviors. First-order graph convolution can meet requirements while avoiding redundant information.
Based on this analysis, to achieve dynamic adaptation of convolution orders under different density scenarios, we establish a piecewise linear order selection function using average local density as input. The specific form is as follows:
K m a x = Φ ( ρ ¯ t ) = 3 , if   ρ ¯ t θ 1 2 , if   θ 1 < ρ ¯ t θ 2 1 , if   ρ ¯ t > θ 2
where K m a x is maximum order, ρ ¯ t is the average local density at time t , θ 1 and θ 2 are density threshold hyperparameters satisfying 0   <   θ 1     θ 2 , determined via grid search experiments in Section 3.5.4.
( θ 1 * , θ 2 * ) = arg min θ 1 , θ 2 L ( θ 1 , θ 2 )
where L is the loss function on the validation set. Experiments yielded optimal thresholds as θ 1 * = 0.14 and θ 2 * = 0.30 .
Additionally, within short time windows (e.g., observed sequence frames), scene density typically remains relatively stable. Therefore, first-frame density can be used as a representative measure for the entire sequence. To improve computational efficiency and ensure sequence consistency, we design a first-frame caching strategy. This mechanism computes average local density for the first timestep of each sequence using Equations (6) and (7), then determines maximum convolution order according to Equation (8). For subsequent timesteps within the same sequence, the mechanism reuses the cached order, eliminating redundant computations. When processing new sequences, the cache is cleared and the order is recalculated.
K m a x [ s , t ] = Φ ( ρ ¯ s 1 ) , t = 1 K m a x [ s , 1 ] , t > 1
where s denotes the sequence index, t denotes the time step index within the sequence, ρ ¯ s 1 denotes the average density of the first frame of the s-th sequence, and Φ ( ) represents the density-to-order mapping function.
Based on the preceding analysis, we design a density-adaptive order selection algorithm, with specific processes presented in Algorithm 2.
Algorithm 2 Density-Adaptive Order Selection Algorithm
Input: P —position set of N   pedestrians   at   a   certain   time ,   P = { p 1 , p 2 , , p n }
Output: M D —spatial interaction adjacency matrix
1: function   AdaptiveOrderSelection   ( P s t )
2:        # Check if it is the first frame of the sequence. If yes, calculate density; otherwise, return cached value
3:         if   t = 1 then
4:                  # Calculate local density for each pedestrian
5:                   for   i = 1 to N do
6:                           c o u n t = 0
7:                           for   j = 1 to N ,   j i do
8:                                      if   p i t p j t r then
9:                                     c o u n t = c o u n t + 1
10:                        end for
11:                         ρ i t = count π r 2
12:                  end for
13:                  # Calculate scene average local density
14:                   ρ ¯ t = 1 N i = 1 N ρ i t
15:                  # Adaptively select maximum convolution order based on density thresholds
16:                   if   ρ ¯ t θ 1   then   K m a x ( s , 1 ) = 3
17:                   else   if     θ 1 < ρ ¯ t θ 2   then   K m a x ( s , 1 ) = 2
18:                   else   K m a x ( s , 1 ) = 1
19:                  # Cache first frame calculated order
20:                   Cache [ s ] = K m a x ( s , 1 )
21:        else
22:                  # Non-first frames directly read order from cache
23:                 K m a x ( s , t ) = Cache [ s ]
24:         return   K m a x ( s , t )

2.4.3. Higher-Order Graph Convolution

Higher-order social interaction modeling requires constructing adjacency matrices that represent multi-hop associations. Based on the first-order spatial adjacency matrix A s p a t i a l obtained in Section 2.3, higher-order spatial adjacency matrices are constructed through power operations on the first-order adjacency matrix. We adopt a progressive calculation strategy, calculating only up to the density-adaptive maximum order K m a x :
A ( 1 ) = A s p a t i a l
A ( k ) = ( A ( 1 ) ) k , k { 2 , 3 , , K m a x }
Subsequently, spatial graph G S and each-order adjacency matrix A ( k ) are processed through graph convolution to extract each-order social interaction features H ( k ) :
H ( k ) = σ ( A ( k ) G S W G S ) , k = 1 , 2 , , K m a x
where W G S is the k-th order learnable weight matrix, σ ( ) denotes the activation function, H ( k ) denotes the k-th order output features.

2.4.4. Adaptive Weight Fusion

We fuse multi-order features through a gating mechanism. Since order configurations vary dynamically across scenarios, maintaining network structure stability requires addressing feature alignment issues. Uncomputed higher-order positions are padded with the maximum-order effective features.
H ( k ) = H ( K max ) ,   k = K max + 1 , , 3
Each order’s features H ( k ) undergo dimensionality reduction, generating compact representations F ( k ) for weight calculation:
F ( k ) = P Re L U ( C o n v ( H ( k ) ) ) ,   k = 1 , 2 , 3
To prevent padded features from affecting prediction, we design an order mask vector where the first K max elements equal 1 and the remainder equal 0. This mask ensures that only actually calculated orders participate in weight allocation.
M order = [ 1 , 1 , , 1 K max , 0 , 0 , , 0 3 K max ] T
Concatenate compressed features and apply the order mask:
F c o n c a t = C o n c a t ( F ( 1 ) , F ( 2 ) , F ( 3 ) )
F m a s k e d = F c o n c a t M o r d e r
The adaptive weights are computed via softmax:
W = ( W ( 1 ) , W ( 2 ) , W ( 3 ) ) = S o f t max ( F m a s k e d )
Each order’s features undergo convolution and residual connections, enhancing representational capability while preserving original information:
F ˜ ( k ) = P Re L U ( C o n v ( H ( k ) ) ) + H ( k )
The final spatial features are obtained through the weighted summation of the adaptive weights and enhanced features:
F s p a t i a l = k = 1 3 W adaptivc ( k ) × F ˜ ( k )

2.5. Temporal Interaction Modeling

The temporal branch captures temporal dependency relationships and dynamic features of pedestrian motion. For pedestrian n ’s observed trajectory sequence, construct temporal graph G T = ( V n , U n ) , where V n = { v n t | t = 1 , , T obs } , U n = { u n k , q | k , q = 1 , , T o b s } represent nodes and edges of the temporal graph, respectively. Here, v n t = p n t p n t 1 represents pedestrian n ’s displacement from time t 1 to time t , and u n k , q { 0 , 1 } indicates whether nodes are connected (1 for connected, 0 otherwise).
Similarly to the spatial branch, we calculate the temporal attention weight matrix R T N × N through the self-attention mechanism as follows:
E T = ϕ G T , W E T
Q T = ϕ E T , W Q T
K T = ϕ E T , W K T
R T = Softmax Q T K T T d T
where ϕ ( ) represents linear transformation, E T represents temporal graph embedding, Q T and K T represent queries and keys in the self-attention mechanism,   W E T , W Q T , W K T represent weights of linear transformations, and d T is a scaling factor ensuring value stability.
Then we combine this with the temporal identity matrix I t e m p o r a l to construct the final temporal adjacency matrix:
A t e m p o r a l = R T I t e m p o r a l
Finally, we process the temporal graph and the temporal adjacency matrix through graph convolution to extract temporal features F t e m p o r a l :
F t e m p o r a l = σ ( A t e m p o r a l G T W G T )
where G T is temporal node feature and W G T is the learnable weight matrix for temporal graph convolution.

2.6. Temporal Convolutional Network and Trajectory Prediction

2.6.1. Spatio-Temporal Feature Fusion

The spatial branch obtains rich social interaction features F s p a t i a l through density-adaptive higher-order graph convolution, while the temporal branch provides temporal motion features F t e m p o r a l . To integrate these two types of complementary information, we perform additive feature fusion:
F f u s e d = Conv f u s i o n ( s ) ( F s p a t i a l ) + Conv f u s i o n ( t ) ( F t e m p o r a l )

2.6.2. Multimodal Trajectory Prediction

The fused spatio-temporal features are processed through TCNs. TCNs consist of multiple one-dimensional convolutional layers and residual connections. The first layer transforms the observation length T o b s into prediction length T p r e d :
F t c n ( 1 ) = Conv o b s p r e d ( F f u s e d )
Subsequent layers maintain prediction length dimensions, mainly used for feature extraction and representation learning. Finally, the prediction layer outputs bivariate Gaussian distribution parameters for each future time step:
[ μ ^ x , n t , μ ^ y , n t , σ ^ x , n t , σ ^ y , n t , ρ ^ n t ] = T C N s ( F t c n ( L ) )

2.7. Loss Function

We model trajectory uncertainty through bivariate Gaussian distributions, enabling multimodal predictions. Optimal model parameters correspond to maximum likelihood, achieved by minimizing the loss function. Therefore, we train the model by minimizing the negative log-likelihood loss function:
L = n = 1 N t = T o b s + 1 T p r e d log ( p ( p n t | μ ^ x , n t , μ ^ y , n t , σ ^ x , n t , σ ^ y , n t , ρ ^ n t ) )
where p n t is the ground truth position of pedestrian n at time step t . This loss function calculates how well model-predicted pedestrian trajectories match actual trajectories, then corrects model parameters through backpropagation.

3. Experiments

3.1. Datasets

We conduct comprehensive evaluations on three public benchmark datasets, which include:
ETH/UCY datasets: ETH [33] and UCY [34] datasets are widely used benchmarks for pedestrian trajectory prediction. These datasets contain five bird’s-eye view scenes (ETH, HOTEL, UNIV, ZARA1, ZARA2) with 1536 trajectory sequences in total. These datasets cover various walking patterns and social interactions of pedestrians in multi-density scenarios. Both datasets sample pedestrian trajectories every 0.4 s. Consistent with mainstream methods [12,13,17,19,27], we employ Leave-One-Out (LOO) cross-validation, where each scene serves as the test set once while the remaining scenes form the training and validation sets.
Stanford Drone Dataset (SDD): The SDD [35] is a bird’s-eye view dataset of pedestrian trajectories captured by drones. The dataset contains multiple university campus scenarios, sampling pedestrian trajectories every 0.4 s. Following the settings of mainstream methods [36,37], we divide the training set into three subsets for model training, validation, and testing, respectively.

3.2. Evaluation Metrics

Following mainstream pedestrian trajectory prediction methods, we select Average Displacement Error (ADE) and Final Displacement Error (FDE) as evaluation metrics.
Average Displacement Error (ADE): Average L2 distance between predicted trajectories and ground truth over all prediction time steps.
A D E = i = 1 N t = T o b s + 1 T p r e d p ^ t i p t i 2 N ( T p r e d T o b s )
Final Displacement Error (FDE): The L2 distance between the predicted destination and the actual destination.
F D E = i = 1 N p ^ T p r e d i p T p r e d i 2 N
To comprehensively evaluate the multimodal characteristics of trajectory prediction, we also introduced the Average Pairwise Displacement (APD) and Final Pairwise Displacement (FPD) metrics to measure the diversity of predicted trajectories [38].
Average Pairwise Displacement (APD): The average L2 distance between all predicted sample pairs.
A P D = i = 1 M j = 1 M t = T o b s + 1 T p r e d p ^ n ( i ) t p ^ n ( j ) t 2 2 M 2 ( T p r e d T o b s )
Final Pairwise Displacement (FPD): The average L2 distance of all predicted sample pairs at the last time step.
F P D = i = 1 M j = 1 M p ^ n ( i ) T p r e d p ^ n ( j ) T p r e d 2 M 2

3.3. Experimental Settings

All experiments are implemented using the PyTorch 2.4.1 deep learning framework on an NVIDIA GeForce RTX 4060 GPU. We employ Adam as the model optimizer. The model is trained for 150 epochs, with the initial learning rate of 0.01 for the first 50 epochs and decaying with a factor of 0.1 for the latter 50 epochs.
Following mainstream pedestrian trajectory prediction research schemes [17,19,27], we use 8 frames (3.2 s) as observed trajectories to predict pedestrian trajectories for the next 12 frames (4.8 s). For multimodal prediction, we sample 20 trajectories from the predicted distribution and compute evaluation metrics for each sample. The sample with the best evaluation metrics is selected as the final socially acceptable trajectory for that pedestrian, its evaluation metric values recorded, and average values of evaluation metrics across all datasets calculated.

3.4. Quantitative Analysis

3.4.1. Comparison with Existing Methods

To evaluate the prediction performance of our method, we compare our method with existing baseline methods on ETH and UCY datasets, with results shown in Table 1.
From the comparison results, our method achieves competitive performance. Compared with classic trajectory prediction methods, our method significantly improves prediction performance. Compared with the recent GNN-based method DSTIGCN, our method reduces ADE by 5.6% and FDE by 9.4%. These performance improvements are primarily attributed to two core design features. First, a sparse and geometrically sound spatial interaction graph is constructed using Delaunay triangulation (Section 2.3), effectively removing redundant connections while preserving key neighboring interactions. This provides a superior graph topology foundation for higher-order social interaction modeling. Second, a density-adaptive higher-order graph convolution mechanism is designed (Section 2.4), enabling accurate capture of multi-layered indirect interactions between pedestrians.
Furthermore, to evaluate the generalization of our method, we compared its prediction performance with existing methods on the SDD, as shown in Table 2. Our method achieves better prediction results on the SDD, further validating the effectiveness of the proposed method.
To evaluate the diversity of predicted trajectories by our method, we conducted trajectory diversity assessments comparing our method with other GNN-based pedestrian trajectory prediction methods, with the number of samples set to 20 for all methods. The results are shown in Table 3. The results demonstrate that our method performs well on diversity metrics. It is noteworthy that Social-STGCNN exhibits higher diversity, but its prediction accuracy (ADE/FDE) is lower than our method. This indicates that our method achieves a better balance between prediction accuracy and diversity.

3.4.2. Model Parameters and Inference Time

Model parameters and inference times reflect model size and real-time performance. To verify the real-time performance of our method, we select several classic models including RNN-based trajectory prediction methods, Transformer-based trajectory prediction methods, and GNN-based trajectory prediction methods. Results are shown in Table 4.
By benefiting from the GNN architecture, our method achieves fewer model parameters and faster inference speed than RNN and Transformer-based trajectory prediction methods. Compared with other GNN-based trajectory prediction methods, our method achieves substantially better overall prediction performance without excessively increasing model parameters and inference times.
To further analyze the efficiency bottlenecks of our method, we conducted temporal decomposition analysis of the model’s inference process, focusing on the Delaunay triangulation and higher-order graph convolution modules. Figure 8 presents the average inference time statistics for different modules across different pedestrian count ranges on the ETH/UCY dataset.
As can be observed, with increasing numbers of pedestrians, the time costs of all modules show an upward trend, with the higher-order graph convolution module occupying the primary computation time. These analyses provide clear directions for subsequent efficiency optimization. Adopting more efficient graph neural network architectures holds promise for improving the model’s computational efficiency.

3.5. Ablation Experiments

3.5.1. Component Ablation Experiments

We conduct ablation experiments to verify the effectiveness of each component. Here, DTM and HGM denote the Delaunay triangulation module and the higher-order graph module, respectively. The baseline model consists of the SGCN framework with the original fixed-threshold graph sparsification module removed, but without the introduction of DTM and HOM modules. Results are shown in Table 5.
Table 5 demonstrates that both components contribute positively to performance. The Delaunay triangulation module eliminates redundant interactions in higher-order interaction modeling, reducing ADE by 5.3% and FDE by 6.1%. The higher-order graph module reduces ADE by 2.6% and FDE by 3.0% by capturing complex social relationships of pedestrians through modeling higher-order interactions. When both the DTM and HGM modules are used simultaneously, the model achieves best performance, proving effectiveness of synergistic effects between components. Experimental results verify the rationality of the proposed architectural design and the necessity of each component.

3.5.2. Graph Construction Method Analysis

To verify the superiority of Delaunay triangulation in graph construction, we design comparative experiments for graph construction methods. Table 6 shows a performance comparison of different graph construction methods. Compared with fully connected graph, Delaunay triangulation reduces ADE and FDE by 8.1% and 9.4%, respectively. Compared with physical constraint methods, it reduces ADE and FDE by 5.6% and 7.9%, respectively, verifying its advantages in constructing reasonable graph structures.

3.5.3. Graph Convolution Order Analysis

To investigate the impact of convolution order on model prediction, we conduct higher-order graph module ablation experiments. Figure 9 shows the impact of different orders on prediction performance.
Experiments show that when the maximum order is fixed, as order increases from 1 to 3, trajectory prediction error continuously decreases, indicating higher-order information helps improve prediction accuracy. However, when exceeding order 3, trajectory prediction error begins to rise, indicating that excessively high orders introduce redundant interaction information, actually reducing prediction performance. Among the fixed order settings, order 3 performs best in ADE/FDE metrics, verifying the reasonableness of selecting 1–3 orders as the adaptive range.
From the computational efficiency perspective, although model parameters for different orders are similar (23.4 K–23.6 K), inference time grows significantly with order. Specifically, inference time grows from 0.0030 s for order 1 to 0.0136 s for order 5, showing an approximately linear growth trend. Notably, our proposed adaptive order method achieves optimal prediction performance while inference time is 0.0062 s, significantly lower than fixed order 3’s 0.0085 s. This indicates density-adaptive mechanism achieves improved computational efficiency through the selective calculation of necessary convolution orders while ensuring prediction accuracy, achieving a balance between accuracy and efficiency.

3.5.4. Density Threshold Sensitivity Analysis

Density thresholds θ 1 and θ 2 in the higher-order graph convolution module directly affect order allocation strategies. Excessively low thresholds lead to too many scenarios being judged as high-density, limiting the utilization of higher-order information. Excessively high thresholds cause excessive use of higher-order convolution, introducing noise. To determine optimal thresholds, we conducted grid search experiments on the ETH/UCY dataset, with results shown in Figure 10.
Results demonstrate that when density thresholds θ 1 and θ 2 , the model achieves optimal prediction performance. Raising or lowering thresholds leads to performance degradation.
To verify the generalizability of these density thresholds, we conducted additional sensitivity analysis on the SDD with results shown in Table 7. The results indicate that the optimal thresholds determined from the ETH/UCY dataset achieve optimal performance on SDD, demonstrating that these thresholds remain robust on the SDD.
Although the test results across datasets demonstrate the parameters’ generalizability, re-tuning according to specific scenarios may be beneficial, such as accounting for cultural differences where different societies may have different personal space norms.

3.5.5. First-Frame Caching Mechanism Analysis

Our proposed first-frame caching mechanism assumes that crowd density does not change significantly within the observation and prediction time horizon (8 s). To support the validity of this assumption, we conducted density stability statistical analysis on the ETH/UCY dataset. Figure 11 shows the distribution of absolute density changes from the first frame to the last frame. The median values across all five scenarios range from 0.05 to 0.07, validating that density changes are limited in most cases.
However, for scenarios with substantial density changes, the first-frame caching mechanism may lead to prediction accuracy degradation. To assess the impact of these situations on model performance, we conducted supplementary comparative experiments comparing the first-frame caching strategy with the frame-by-frame recalculation strategy, with results shown in Table 8. The results demonstrate that the first-frame caching strategy exhibits only a 0.6% reduction in prediction accuracy compared to the frame-by-frame calculation strategy, while reducing inference time by 13.9%. This proves that the first-frame caching mechanism effectively improves computational efficiency while maintaining prediction accuracy.

3.6. Qualitative Analysis

Figure 12 visualizes predictions across four representative scenarios. Trajectory prediction results for each scenario are based on 20 candidate trajectories generated by Monte Carlo sampling. We generate continuous probability distribution visualizations using kernel density estimation.
To intuitively demonstrate the superiority of our method, multiple trajectory sample comparisons were generated by comparing predicted trajectory distributions of our method with Social-STGCNN and IMGCN. Figure 13 shows the prediction result visualizations of different methods in typical scenarios.
Due to using fully connected graphs considering influences of all neighbors, Social-STGCNN leads predicted trajectories to exhibit unnecessary deviations. IMGCN employs fixed physical constraints that impose a limited field of view, failing to adequately capture the coordination effects of fellow travelers in parallel walking scenarios. Our method achieves more accurate predictions using Delaunay triangulation by retaining only the necessary connections. Furthermore, introducing the density-adaptive order selection mechanism enables our method to flexibly adjust convolution order configurations based on the scene’s pedestrian density characteristics, thereby maintaining stable prediction performance across different density scenarios.

4. Conclusions

4.1. Summary

This paper proposes a pedestrian trajectory prediction method based on Delaunay triangulation and density-adaptive higher-order graph convolution, addressing two key challenges in GNN-based trajectory prediction: balancing the reduction in redundant connections with the preservation of critical interaction relationships and the lack of scene-adaptive capability in higher-order graph convolution. The main contributions and experimental results are summarized as follows:
  • Delaunay triangulation-based sparse graph construction: We introduce Delaunay triangulation from computational geometry into pedestrian social interaction graph construction. By exploiting its geometrically optimal empty circle property, this approach maintains essential spatial proximity while eliminating redundant connections effectively, providing more reasonable topological foundations for subsequent graph convolution operations, thereby improving prediction accuracy. Compared to the fully connected graph method, this method reduces ADE and FDE by 10.5% and 13.4%, respectively. Compared to the physics-constrained method, it achieves reductions of 8.1% in ADE and 10.8% in FDE, demonstrating this method’s advantages in constructing reasonable graph structures.
  • Density-adaptive higher-order graph convolution: We design a density-adaptive order selection mechanism that dynamically adjusts graph convolution order based on scene density. Low-density scenarios employ third-order convolution to capture long-range indirect interactions, medium-density scenarios use second-order convolution to balance direct and indirect influences, and high-density scenarios adopt first-order convolution to avoid visual occlusion interference. Ablation experiments show that compared to mainstream fixed first-order settings, this mechanism reduces ADE and FDE by 5.6% and 14.7%, respectively. Compared to the best-performing fixed third-order settings, it maintains comparable accuracy while reducing inference time from 0.0085 s to 0.0062 s, achieving a balance between accuracy and efficiency.
  • Efficient computational optimization strategy: For characteristics of sequence prediction tasks, we design a first-frame caching mechanism to reduce algorithm time complexity. Simultaneously, we propose a masked adaptive weight fusion module achieving dynamic weighted combination of different order features, effectively addressing feature alignment issues under dynamic order configurations.

4.2. Future Work

While the proposed method demonstrates promising performance, several directions warrant further investigation:
  • Fusion and Modeling of Multimodal Interaction Information: Current methods primarily focus on modeling pedestrian social interactions based on positional geometric relationships, with limited consideration of environmental constraint factors (such as static obstacles and road topology) and pedestrian motion attributes (such as velocity direction and target intent). Although spatial attention mechanisms can implicitly learn some pedestrian information from historical trajectories, when handling geometrically adjacent pedestrian pairs moving in opposite directions, explicitly incorporating velocity direction and target intent information may further enhance the model’s discriminative capability in more complex scenarios. Future work will explore integrating multimodal information such as semantic scene information, velocity vectors, and target intent into graph node features or edge weight calculations, further improving the model’s prediction accuracy and interpretability in complex heterogeneous scenarios.
  • More Refined Scene-Adaptive Mechanisms: This paper employs an order selection strategy based on scene average density, but this method still relies on manually set hyperparameters that may require readjustment for specific application scenarios. Moreover, using a single global variable within the same frame may have limitations in scenarios with severe density heterogeneity. Future work will explore designing end-to-end learnable order selection networks, with dynamic decision modules based on deep reinforcement learning enabling the model to autonomously learn order decision strategies. For scenarios with severe density heterogeneity, we will investigate the feasibility of dynamically selecting independent orders for each pedestrian node, achieving more refined scene-adaptive modeling while ensuring computational efficiency, further enhancing the model’s adaptive capabilities across different scenarios.
  • Engineering deployment and optimization: Current experiments are limited to offline evaluations. Future work will deploy the method to actual autonomous driving platforms, validating its performance in real environments. Additionally, for resource constraints of edge computing devices like mobile robots, we will research model compression and quantization acceleration techniques. These efforts aim to achieve further balance between real-time performance and accuracy, promoting algorithm transition from theoretical research to practical applications.

Author Contributions

Conceptualization, Lei Chen and Jiajia Li; methodology, Jun Xiao; software, Rui Liu; validation, Jiajia Li; formal analysis, Lei Chen and Jiajia Li; investigation, Jiajia Li and Rui Liu; resources, Lei Chen; data curation, Jiajia Li and Rui Liu; writing—original draft preparation, Lei Chen and Jiajia Li; writing—review and editing, Lei Chen; visualization, Rui Liu; supervision, Jun Xiao; project administration, Lei Chen and Jun Xiao; funding acquisition, Lei Chen. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Science and Technology Major Special Program (Grant AA23062024 and AA23062066).

Data Availability Statement

The dataset presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rudenko, A.; Palmieri, L.; Herman, M.; Kitani, K.M.; Gavrila, D.M.; Arras, K.O. Human Motion Trajectory Prediction: A Survey. Int. J. Robot. Res. 2020, 39, 895–935. [Google Scholar] [CrossRef]
  2. Jiang, J.; Yan, K.; Xia, X.; Yang, B. A Survey of Deep Learning-Based Pedestrian Trajectory Prediction: Challenges and Solutions. Sensors 2025, 25, 957. [Google Scholar] [CrossRef]
  3. Huang, Y.; Du, J.; Yang, Z.; Zhou, Z.; Zhang, L.; Chen, H. A Survey on Trajectory-Prediction Methods for Autonomous Driving. IEEE Trans. Intell. Veh. 2022, 7, 652–674. [Google Scholar] [CrossRef]
  4. Bharilya, V.; Kumar, N. Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A Comprehensive Survey, Challenges, and Future Research Directions. Veh. Commun. 2024, 46, 100733. [Google Scholar] [CrossRef]
  5. Zhang, C.; Berger, C. Pedestrian Behavior Prediction Using Deep Learning Methods for Urban Scenarios: A Review. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10279–10301. [Google Scholar] [CrossRef]
  6. Schöller, C.; Aravantinos, V.; Lay, F.; Knoll, A. What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction. IEEE Robot. Autom. Lett. 2020, 5, 1696–1703. [Google Scholar] [CrossRef]
  7. Zhang, L.; Yuan, K.; Chu, H.; Huang, Y.; Ding, H.; Yuan, J.; Chen, H. Pedestrian Collision Risk Assessment Based on State Estimation and Motion Prediction. IEEE Trans. Veh. Technol. 2022, 71, 98–111. [Google Scholar] [CrossRef]
  8. Helbing, D.; Molnár, P. Social Force Model for Pedestrian Dynamics. Phys. Rev. E 1995, 51, 4282–4286. [Google Scholar] [CrossRef]
  9. Bi, H.; Fang, Z.; Mao, T.; Wang, Z.; Deng, Z. Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10382–10391. [Google Scholar]
  10. Cheng, H.; Sester, M. Modeling Mixed Traffic in Shared Space Using LSTM with Probability Density Mapping. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3898–3904. [Google Scholar]
  11. Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.S.; Chandraker, M. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2165–2174. [Google Scholar]
  12. Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar]
  13. Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2255–2264. [Google Scholar]
  14. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
  15. Giuliari, F.; Hasan, I.; Cristani, M.; Galasso, F. Transformer Networks for Trajectory Forecasting. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 10335–10342. [Google Scholar]
  16. Yuan, Y.; Weng, X.; Ou, Y.; Kitani, K. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9793–9803. [Google Scholar]
  17. Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14412–14420. [Google Scholar]
  18. Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6271–6280. [Google Scholar]
  19. Shi, L.; Wang, L.; Long, C.; Zhou, S.; Zhou, M.; Niu, Z.; Hua, G. SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8990–8999. [Google Scholar]
  20. Sang, H.; Chen, W.; Wang, J.; Zhao, Z. RDGCN: Reasonably Dense Graph Convolution Network for Pedestrian Trajectory Prediction. Measurement 2023, 213, 112675. [Google Scholar] [CrossRef]
  21. Zhang, X.; Angeloudis, P.; Demiris, Y. Dual-Branch Spatio-Temporal Graph Neural Networks for Pedestrian Trajectory Prediction. Pattern Recognit. 2023, 142, 109633. [Google Scholar] [CrossRef]
  22. Chen, W.; Sang, H.; Wang, J.; Zhao, Z. IMGCN: Interpretable Masked Graph Convolution Network for Pedestrian Trajectory Prediction. Transp. B Transp. Dyn. 2024, 12, 2389896. [Google Scholar] [CrossRef]
  23. Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; Steeg, G.V.; Galstyan, A. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 21–29. [Google Scholar]
  24. Kim, S.; Chi, H.; Lim, H.; Ramani, K.; Kim, J.; Kim, S. Higher-Order Relational Reasoning for Pedestrian Trajectory Prediction. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 15251–15260. [Google Scholar]
  25. Chen, W.; Sang, H.; Zhao, Z. PCHGCN: Physically Constrained Higher-Order Graph Convolutional Network for Pedestrian Trajectory Prediction. IEEE Internet Things J. 2025, 12, 25033–25045. [Google Scholar] [CrossRef]
  26. Wen, D.; Xu, H.; He, Z.; Wu, Z.; Tan, G.; Peng, P. Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 14822–14832. [Google Scholar] [CrossRef]
  27. Chen, W.; Sang, H.; Wang, J.; Zhao, Z. DSTIGCN: Deformable Spatial-Temporal Interaction Graph Convolution Network for Pedestrian Trajectory Prediction. IEEE Trans. Intell. Transp. Syst. 2025, 26, 6923–6935. [Google Scholar] [CrossRef]
  28. Fall, A.; Fortin, M.-J.; Manseau, M.; O’Brien, D. Spatial Graphs: Principles and Applications for Habitat Connectivity. Ecosystems 2007, 10, 448–461. [Google Scholar] [CrossRef]
  29. Feng, J.; Fu, J.; Shang, C.; Lin, Z.; Niu, X.; Li, B. Efficient Generation Strategy for Hierarchical Porous Scaffolds with Freeform External Geometries. Addit. Manuf. 2020, 31, 100943. [Google Scholar] [CrossRef]
  30. Yuan, W.; Zhu, J.; Wang, N.; Zhang, W.; Dai, B.; Jiang, Y.; Wang, Y. A Dynamic Large-Deformation Particle Finite Element Method for Geotechnical Applications Based on Abaqus. J. Rock Mech. Geotech. Eng. 2023, 15, 1859–1871. [Google Scholar] [CrossRef]
  31. Fradi, H.; Luvison, B.; Quoc, C.P. Crowd Behavior Analysis Using Local Mid-Level Visual Descriptors. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 589–602. [Google Scholar] [CrossRef]
  32. Hall, E.T. The Hidden Dimension; Anchor Books: New York, NY, USA, 1990. [Google Scholar]
  33. Pellegrini, S.; Ess, A.; Schindler, K.; Van Gool, L. You’ll Never Walk Alone: Modeling Social Behavior for Multi-Target Tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 261–268. [Google Scholar]
  34. Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by Example. Comput. Graph. Forum 2007, 26, 655–664. [Google Scholar] [CrossRef]
  35. Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning Social Etiquette: Human Trajectory Understanding in Crowded Scenes. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 549–565. [Google Scholar]
  36. Monti, A.; Bertugli, A.; Calderara, S.; Cucchiara, R. DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2551–2558. [Google Scholar]
  37. Mohamed, A.; Zhu, D.; Vu, W.; Elhoseiny, M.; Claudel, C. Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 463–479. [Google Scholar]
  38. Chen, J.; Cao, J.; Lin, D.; Kitani, K.; Pang, J. MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction. In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 9–15 December 2024; pp. 57539–57563. [Google Scholar]
  39. Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12077–12086. [Google Scholar]
  40. Lian, J.; Ren, W.; Li, L.; Zhou, Y.; Zhou, B. PTP-STGCN: Pedestrian Trajectory Prediction Based on a Spatio-Temporal Graph Convolutional Neural Network. Appl. Intell. 2023, 53, 2862–2878. [Google Scholar] [CrossRef]
  41. Fang, Y.; Jin, Z.; Cui, Z.; Yang, Q.; Xie, T.; Hu, B. Modeling Human-Human Interaction with Attention-Based High-Order GCN for Trajectory Prediction. Vis. Comput. 2022, 38, 2257–2269. [Google Scholar] [CrossRef]
  42. Youssef, T.; Zemmouri, E.; Bouzid, A. STM-GCN: A Spatiotemporal Multi-Graph Convolutional Network for Pedestrian Trajectory Prediction. J. Supercomput. 2023, 79, 20923–20937. [Google Scholar] [CrossRef]
  43. Sun, C.; Wang, B.; Leng, J.; Zhang, X.; Wang, B. SDAGCN: Sparse Directed Attention Graph Convolutional Network for Spatial Interaction in Pedestrian Trajectory Prediction. IEEE Internet Things J. 2024, 11, 39225–39235. [Google Scholar] [CrossRef]
  44. Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 507–523. [Google Scholar]
Figure 1. Existing methods for modeling pedestrian social interactions. (a) Fully connected pedestrian social interactions; (b) physical-constrained pedestrian social interactions.
Figure 1. Existing methods for modeling pedestrian social interactions. (a) Fully connected pedestrian social interactions; (b) physical-constrained pedestrian social interactions.
Ijgi 15 00042 g001
Figure 2. Schematic diagram of the interaction range of pedestrian multi-hop neighbors. (a) 1-hop neighbor; (b) 2-hop neighbor; (c) 3-hop neighbor.
Figure 2. Schematic diagram of the interaction range of pedestrian multi-hop neighbors. (a) 1-hop neighbor; (b) 2-hop neighbor; (c) 3-hop neighbor.
Ijgi 15 00042 g002
Figure 3. SGCN model network structure.
Figure 3. SGCN model network structure.
Ijgi 15 00042 g003
Figure 4. Reconstructed model network structure.
Figure 4. Reconstructed model network structure.
Ijgi 15 00042 g004
Figure 5. Schematic diagram illustrating the empty circle property of Delaunay triangulation. (a) Triangulation satisfying the empty circle property; (b) Triangulation not satisfying the empty circle property.
Figure 5. Schematic diagram illustrating the empty circle property of Delaunay triangulation. (a) Triangulation satisfying the empty circle property; (b) Triangulation not satisfying the empty circle property.
Ijgi 15 00042 g005
Figure 6. Pedestrian interaction topology based on Delaunay triangulation.
Figure 6. Pedestrian interaction topology based on Delaunay triangulation.
Ijgi 15 00042 g006
Figure 7. Density-adaptive higher-order graph convolution process.
Figure 7. Density-adaptive higher-order graph convolution process.
Ijgi 15 00042 g007
Figure 8. Decomposition of model inference time.
Figure 8. Decomposition of model inference time.
Ijgi 15 00042 g008
Figure 9. Performance comparison of different graph convolution orders. (a) Comparison of ADE/FDE at different orders; (b) comparison of inference time at different orders.
Figure 9. Performance comparison of different graph convolution orders. (a) Comparison of ADE/FDE at different orders; (b) comparison of inference time at different orders.
Ijgi 15 00042 g009
Figure 10. Comparison of ADE/FDE with different density thresholds. (a) Comparison of ADE with different density thresholds; (b) comparison of FDE with different density thresholds. Red and blue frames indicate the best values for ADE and FDE, respectively.
Figure 10. Comparison of ADE/FDE with different density thresholds. (a) Comparison of ADE with different density thresholds; (b) comparison of FDE with different density thresholds. Red and blue frames indicate the best values for ADE and FDE, respectively.
Ijgi 15 00042 g010
Figure 11. Distribution of absolute difference in pedestrian density between the first and last frames within the observation and prediction time horizon.
Figure 11. Distribution of absolute difference in pedestrian density between the first and last frames within the observation and prediction time horizon.
Ijgi 15 00042 g011
Figure 12. Visualization of predicted trajectories in different scenarios. (a) Parallel walking; (b) walking towards each other; (c) group movement; (d) dense intersection. Red and blue lines denote observed and ground-truth trajectories, respectively, while the color spectrum represents trajectory distributions of different pedestrians.
Figure 12. Visualization of predicted trajectories in different scenarios. (a) Parallel walking; (b) walking towards each other; (c) group movement; (d) dense intersection. Red and blue lines denote observed and ground-truth trajectories, respectively, while the color spectrum represents trajectory distributions of different pedestrians.
Ijgi 15 00042 g012
Figure 13. Visual comparison of predicted trajectories using different trajectory prediction methods. (a) Parallel walking; (b) walking towards each other; (c) group movement; (d) dense intersection. Red and blue lines represent observed and ground-truth trajectories, respectively, yellow lines are the predicted trajectories of various methods, orange circles mark the obvious areas of prediction error.
Figure 13. Visual comparison of predicted trajectories using different trajectory prediction methods. (a) Parallel walking; (b) walking towards each other; (c) group movement; (d) dense intersection. Red and blue lines represent observed and ground-truth trajectories, respectively, yellow lines are the predicted trajectories of various methods, orange circles mark the obvious areas of prediction error.
Ijgi 15 00042 g013
Table 1. ADE/FDE metrics for our method and existing baseline methods on ETH/UCY datasets (unit: m). Lower values are better. Optimal and suboptimal values are indicated by bold and underline, respectively.
Table 1. ADE/FDE metrics for our method and existing baseline methods on ETH/UCY datasets (unit: m). Lower values are better. Optimal and suboptimal values are indicated by bold and underline, respectively.
MethodYearETHHOTELUNIVZARA1ZARA2AVG
Social-LSTM [12]20161.09/2.350.79/1.760.67/1.400.47/1.000.56/1.170.72/1.54
Social-GAN [13]20180.81/1.520.72/1.610.60/1.260.34/0.690.42/0.840.58/1.18
SR-LSTM [39]20190.63/1.250.37/0.740.51/1.100.41/0.900.32/0.700.45/0.94
Social-STGCNN [17]20200.64/1.110.49/0.850.44/0.790.34/0.530.30/0.480.44/0.75
SGCN [19]20210.63/1.030.32/0.550.37/0.700.29/0.530.25/0.450.37/0.65
PTP-STGCN [40]20220.63/1.040.34/0.450.48/0.870.37/0.610.30/0.460.42/0.68
High-order GCN [41]20220.54/1.090.24/0.440.53/1.140.41/0.890.32/0.700.41/0.85
STMGCN [42]20230.73/1.130.31/0.420.45/0.850.33/0.530.29/0.460.42/0.67
IMGCN [22]20240.61/0.820.31/0.450.37/0.670.29/0.510.24/0.420.36/0.57
SDAGCN [43]20240.73/1.200.34/0.460.48/0.870.35/0.550.32/0.520.44/0.72
HighGraph [24]20240.60/0.930.31/0.400.40/0.700.33/0.490.29/0.450.39/0.59
DSTIGCN w/o LHS [27]20250.60/1.000.33/0.560.37/0.700.28/0.500.24/0.430.36/0.64
Ours-0.56/0.880.29/0.450.35/0.680.29/0.450.23/0.430.34/0.58
Table 2. Comparison of our method with existing methods on the SDD (unit: m). Lower values are better, the best values are indicated in bold.
Table 2. Comparison of our method with existing methods on the SDD (unit: m). Lower values are better, the best values are indicated in bold.
STGAT [18]DAG-Net [36]SGCN [19]Social-Implicit [37]IMGCN [22]Ours
ADE0.580.530.460.470.460.43
FDE1.111.040.750.890.740.72
Table 3. APD/FPD metrics for our method and existing GNN-based methods on ETH/UCY datasets (unit: m). Higher values are better, the best values are indicated in bold.
Table 3. APD/FPD metrics for our method and existing GNN-based methods on ETH/UCY datasets (unit: m). Higher values are better, the best values are indicated in bold.
MethodYearETHHOTELUNIVZARA1ZARA2AVG
Social-STGCNN [17]20200.40/0.630.59/0.920.33/0.500.49/0.760.42/0.660.45/0.69
IMGCN [22]20240.48/0.870.49/0.900.35/0.620.34/0.590.28/0.480.39/0.69
DSTIGCN w/o LHS [27]20250.52/1.080.39/0.670.33/0.56030/0.560.25/0.460.36/0.67
Ours-0.59/1.080.46/0.830.33/0.570.34/0.590.28/0.490.40/0.71
Table 4. Comparison of our method with other methods in terms of model parameters and inference time. Lower values are better, K and s represent thousands and seconds, respectively.
Table 4. Comparison of our method with other methods in terms of model parameters and inference time. Lower values are better, K and s represent thousands and seconds, respectively.
MethodCharacteristicModel
Parameters
Inference Times
Social-LSTM [12]RNN264 K0.2188 s
SR-LSTM [39]RNN64.9 K0.0708 s
Social-GAN [13]RNN46.3 K0.0551 s
TF [15]Transformer33,082.8 K0.0532 s
STAR [44]Transformer964.9 K0.0214 s
Social-STGCNN [17]GNN7.6 K0.0020 s
SGCN [19]GNN25 K0.0023 s
RDGCN [20]GNN28 K0.0025 s
IMGCN [22]GNN23.3 K0.0030 s
OursGNN23.6 K0.0062 s
Table 5. Ablation studies on model components. Lower values are better. “√” indicates the component is included in the model, “-” indicates the component is removed from the model.
Table 5. Ablation studies on model components. Lower values are better. “√” indicates the component is included in the model, “-” indicates the component is removed from the model.
DTMHGMETHHOTELUNIVZARA1ZARA2AVG
--0.64/1.130.34/0.560.38/0.690.29/0.540.25/0.420.38/0.66
-0.56/0.860.33/0.590.37/0.570.29/0.510.25/0.450.36/0.62
-0.62/1.010.34/0.590.36/0.660.29/0.520.25/0.430.37/0.64
0.56/0.880.29/0.450.35/0.680.29/0.450.23/0.430.34/0.58
Table 6. Performance comparison of different graph construction methods.
Table 6. Performance comparison of different graph construction methods.
Graph Construction MethodETHHOTELUNIVZARA1ZARA2AVG
Fully connected0.62/1.010.34/0.590.36/0.660.29/0.520.25/0.430.37/0.64
Physics-constrained0.59/1.030.32/0.530.37/0.670.30/0.500.24/0.420.36/0.63
Delaunay triangulation (Ours)0.56/0.880.29/0.450.35/0.680.29/0.450.23/0.430.34/0.58
Table 7. Comparison of ADE/FDE with different density thresholds on the SDD (unit: m).
Table 7. Comparison of ADE/FDE with different density thresholds on the SDD (unit: m).
θ20.260.300.34
θ1
0.100.43/0.730.43/0.730.44/0.72
0.140.44/0.730.43/0.720.4/0.72
0.180.44/0.740.44/0.730.44/0.73
Table 8. Performance comparison between first-frame caching mechanism and frame-by-frame computation mechanism.
Table 8. Performance comparison between first-frame caching mechanism and frame-by-frame computation mechanism.
MethodETHHOTELUNIVZARA1ZARA2AVGInference Times
Frame-by-frame computation0.56/0.860.28/0.450.35/0.690.29/0.440.23/0.420.34/0.570.0072 s
First-frame caching0.56/0.880.29/0.450.35/0.680.29/0.450.23/0.430.34/0.580.0062 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, L.; Li, J.; Xiao, J.; Liu, R. Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network. ISPRS Int. J. Geo-Inf. 2026, 15, 42. https://doi.org/10.3390/ijgi15010042

AMA Style

Chen L, Li J, Xiao J, Liu R. Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network. ISPRS International Journal of Geo-Information. 2026; 15(1):42. https://doi.org/10.3390/ijgi15010042

Chicago/Turabian Style

Chen, Lei, Jiajia Li, Jun Xiao, and Rui Liu. 2026. "Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network" ISPRS International Journal of Geo-Information 15, no. 1: 42. https://doi.org/10.3390/ijgi15010042

APA Style

Chen, L., Li, J., Xiao, J., & Liu, R. (2026). Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network. ISPRS International Journal of Geo-Information, 15(1), 42. https://doi.org/10.3390/ijgi15010042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop