1. Introduction
In the fields of urban navigation and personal safety, reliable and precise pedestrian localization and tracking systems play an extremely critical role. Pedestrian trajectory prediction (PTP) [
1] serves as a key technical prerequisite for ensuring the safety of vulnerable road users (VRUs) [
2] in autonomous driving systems. Its core task is to model a pedestrian’s past and current movement data to infer their historical motion states, thereby predicting their future spatiotemporal path [
3]. However, pedestrian motion is inherently highly random and multimodal due to intentional uncertainty [
4], making accurate and robust trajectory prediction a persistent challenge, particularly in complex urban environments.
In pedestrian navigation, high-precision inertial measurement units (IMUs) can provide high-frequency motion data, but errors in IMU sensors accumulate over time [
5,
6,
7]. Multi-global navigation satellite systems (Multi-GNSS) [
8] can improve positioning accuracy, but signals are prone to loss in complex environments such as urban high-rise buildings and tunnels [
9]. Therefore, the INS/GNSS integrated navigation system maintains accuracy in the event of signal loss through complementary advantages, while eliminating IMU accumulated errors through satellite positioning. The key lies in effectively integrating the two types of information [
10,
11]. To suppress IMU drift, IMUs are often placed on the feet or legs and zero velocity update (ZUPT) technology is used. Traditional ZUPT relies on a fixed threshold and is only suitable for uniform motion, making it difficult to adapt to different gaits. Therefore, in previous studies, Johan et al. [
12] uased Bayesian adaptive thresholding method to select a separate threshold for each type of motion pattern, but this method overly relied on the number of motion patterns. Cho et al. [
13] developed a threshold-free algorithm that detects zero speed through signal shape but is limited to walking and brisk walking motion patterns. This article proposes an adaptive unscented Kalman filter localization algorithm based on gait constraint model. By establishing a pedestrian gait phase constraint model and detecting zero velocity intervals, the heading and step size are optimized according to different motion states. Meanwhile, utilizing an adaptive Kalman filter to fuse motion data and correct position drift during gait effectively improves positioning accuracy and robustness in complex environments.
With the rapid development of intelligent transportation, urban planning, and public safety, trajectory prediction technology has gradually become one of the research hotspots. Pedestrian trajectory prediction is widely used in scenarios such as autonomous driving, crowd behavior analysis, and intelligent monitoring [
14]. Due to human sociality, uncertainty in movement, and environmental factors, pedestrian trajectory prediction is a challenging task. In pedestrian trajectory prediction methods, there are mainly data-driven methods and motion model establishment. Firstly, most prediction methods rely on observable external stimuli [
15], including historical trajectories, kinematic attributes (such as position, velocity, angular velocity), and contextual information such as road geometry and pedestrian vehicle interaction [
16]. Secondly, modeling methods include parameterized methods based on kinematics and dynamics, as well as shallow and deep learning techniques [
17,
18]. These methods are optimized through various loss functions to generate outputs such as Gaussian distributions, multimodal trajectories, or probabilistic grids.
In recent years, the development of deep learning has significantly improved the accuracy and robustness of pedestrian trajectory prediction, making it increasingly important in practical applications such as autonomous driving and robot navigation. However, accurately modeling of the spatiotemporal relationships in pedestrian motion-especially when facing complex scenes and multimodal future behavior-remains a challenge. Early works such as the Social LSTM model proposed by Alahi et al. [
19] integrated the hidden states of neighboring pedestrians through a grid based pooling mechanism, achieving preliminary modeling of pedestrian interaction. Gupta et al. [
20] introduced generative adversarial networks (GANs) to handle the multimodal characteristics of trajectory prediction. However, these methods often rely on predefined interaction features or fixed neighborhood structures, making it difficult to explain the complex relationships between pedestrian trajectories. Li et al. [
21] used an adaptive spatiotemporal graph construction algorithm to calculate edge weights based on dynamic features such as velocity and direction and combined them with temporal characteristics to generate more accurate trajectory predictions. The SHENet framework proposed by Meng et al. [
22] uses a memory bank to store historical trajectories and establishes trajectory prediction based on the relationship between individuals and their surrounding environment. Li, R. et al. [
23] combined multi-scale graph based spatial transformers and trajectory smoothing algorithms to predict multiple paths of historical trajectories. Although these advances have been made, there are still two issues with pedestrian trajectory prediction at present. Firstly, methods based on spatiotemporal graphs overly rely on interaction graphs with a single scale, ignoring the long-term trajectory relationships of pedestrians. Secondly, generative models perform well in terms of diversity but often lack clear mechanisms to ensure temporal consistency, resulting in high-frequency turns in generated paths and often ignoring pedestrian dynamic behavior.
Pedestrian trajectory prediction relies on current and historical trajectory information, detecting pedestrian motion intentions and states through motion behavior to enhance the accuracy of future trajectory prediction. This article proposes a multi-source data fusion behavioral attention mechanism framework for pedestrian trajectory prediction and path planning. Using INS and GNSS data to locate pedestrian trajectories, while capturing pedestrian gait motion information as guidance data for motion features. The framework uses a gated recurrent unit (GRU) encoder to extract key features of pedestrian motion and construct an adaptive fusion mechanism guided by physical constraints. Introducing a memory module to store pedestrian historical trajectories and assigning different attention weights to trajectory features at different scales through an attention mechanism addressable device. The LSTM decoder combines a spatiotemporal constraint path planning coordination mechanism to decode future pedestrian motion trajectories, achieving accurate prediction of pedestrian travel paths. The main contributions of this article are summarized as follows:
1. Existing gait assistance algorithms typically use ZUPT or PDR algorithms to estimate the step size and direction of two-dimensional plane displacement localization. The gait constraint localization method proposed in this article can distinguish gait changes and use ZUPT during the standing phase, which can not only update speed but also dynamically adjust the noise statistical characteristics of the filter. At the same time, a motion model for the swing phase was constructed to constrain and expand trajectory positioning in three-dimensional space. The method provides more detailed pedestrian gait decomposition and action model analysis, improving the accuracy of pedestrian trajectory localization.
2. Unlike visual or radar based prediction methods, the limitations of static images make it difficult to understand the spatiotemporal connections between pedestrian movements. By using wearable solutions, the complex spatial relationships of pedestrian movements can be easily captured for predicting low-cost pedestrian positions.
3. The method integrates pedestrian gait information into a unified framework for target tracking and future trajectory prediction, achieving end-to-end sharing of information and effectively addressing noise issues in practical scenarios.
4. The historical memory module is introduced into the attention mechanism, which ensures the smoothness and temporal consistency of predicted trajectories based on retrieval features and trajectory embedding.
The structure of this article is as follows:
Section 2 introduces the design framework of the wearable devices used in the study.
Section 3 introduces the gait constrained trajectory localization algorithm and pedestrian trajectory prediction algorithm framework proposed in this study.
Section 4 introduces the results of algorithm simulation environment and experimental scenario testing.
Section 5 is the research conclusion.
3. Methodology
3.1. System Overview
The framework proposed in this article is an integrated framework that includes data perception, pedestrian gait analysis and localization, and pedestrian trajectory prediction. Its core is the principle of multi-source sensor fusion: fusing high-frequency IMU data with low-frequency but high-precision GNSS data to achieve stable and accurate pedestrian positioning and motion prediction. The main modules of the system are shown in
Figure 2.
The system collects data obtained from wearable devices, filters it, and inputs it into the Gait-AUKF algorithm for high-precision 3D positioning by integrating IMU and GNSS data. Detailed gait analysis is also performed to obtain pedestrian gait action data, provide the data output by the algorithm to the prediction module, capture the dynamic characteristics of pedestrian motion through gate controlled loop units, and establish spatiotemporal correlations by compressing historical trajectories with memory enhanced attention mechanisms to guide future predictions. Based on a long short-term memory network, a future trajectory with reasonable behavior is generated, and the A* path planning algorithm is added to physically constrain the motion trajectory. The final predicted trajectory generated is both in line with the intention and feasible.
3.2. Pedestrian Dead Reckoning
Pedestrian dead reckoning (PDR) [
24] is a sensor-based localization technique that estimates a pedestrian’s trajectory in real-time by detecting steps, estimating stride length, and determining heading angle. In this paper, wavelet transform is employed to distinguish gait phases, and the derived step frequency and stride length are incorporated into the PDR algorithm to enhance the accuracy of trajectory estimation. Given an initial position
, subsequent positions are computed recursively using the step length
and the heading angle
:
where
and
denote the current position coordinates,
k represents the step index, and
. During walking, the magnitude of acceleration exhibits periodic fluctuations. For heading estimation, angular velocity
acquired from a triaxial gyroscope is integrated numerically to obtain a short-term heading estimate. Observations from a triaxial magnetometer
are projected onto the horizontal plane using pitch and roll angles, thereby compensating for the accumulated integration error of the gyroscope through magnetometer-based correction. Equation (
3) converts the magnetometer from the device coordinate system to the horizontal navigation coordinate system, where
represents the X-direction magnetic field component on the horizontal plane, and
represents the Y-direction magnetic field component on the horizontal plane. An adaptive weighting factor
is introduced to balance the contributions of the two sensors:
3.3. Adaptive Unscented Kalman Filter Algorithm
The Kalman Filter is an optimal recursive estimation algorithm based on Bayesian estimation theory, suitable for linear Gaussian systems. Its core concept lies in a predictionupdate cycle that integrates a system dynamics model with noisy measurement data to achieve minimum mean square error estimation of the state variables. The state and observation equations are given by Equations (6) and (7), respectively:
where
denote the state vector and observation vector at time
k,
is the state transition matrix, and
is the observation matrix. The process noise
and the measurement noise
, where
and
represent the process and measurement noise covariance matrices, respectively. Their statistical properties are specified as Formula (8).
The unscented Kalman filter addresses the nonlinear propagation of mean and covariance through the unscented transform (UT), which employs a deterministic sampling strategy [
25]. A set of sigma points is generated and propagated through the nonlinear function, after which the mean and covariance of the transformed points are computed to approximate the output statistics. For a discrete-time nonlinear system, the UKF is formulated as follows:
where
is a nonlinear state transition function,
is the observation function, and
,
are uncorrelated zero-mean Gaussian white noise processes. Here,
is a non-negative definite matrix and
is a positive definite matrix, representing the covariance matrices of
and
, respectively. The Kronecker delta function is denoted by
.
Algorithms estimate the state of nonlinear dynamic systems through a series of structured processes. The process begins with the initialization of state estimation and covariance matrix:
where
is the initial state vector and
is a positive symmetric definite covariance matrix. Subsequently, a set of
sigma points is generated, where denotes the dimension of the state vector. Given the state mean
and covariance
at time
, the sigma points are computed as
Here,
,
.
,
is a scaling parameter that controls the spread of the sigma points around the mean. During the time update, each sigma point is propagated through the nonlinear process model:
The predicted state mean
and covariance
are then computed as
where
Q is the process noise covariance matrix, and
and
are weights assigned to the mean and covariance calculations, respectively.
For the measurement update, the observation sigma points are generated and transformed using the measurement model:
The predicted measurement mean
, innovation covariance
, and cross-covariance
are calculated as follows:
Finally, the Kalman gain
is computed, and the state estimate and covariance are updated:
This formulation ensures an efficient and accurate mechanism for state estimation in nonlinear systems through sigma point propagation and statistical linearization.
The adaptive unscented Kalman filter (AUKF) [
26] incorporates a mechanism for adaptive tuning of the noise covariance matrices. The algorithm proposed in this paper further integrates an adaptive adjustment strategy based on the statistical characteristics of the innovation sequence, enabling dynamic optimization of filtering parameters in response to real time changes in pedestrian motion. The adaptive factor is computed using the norm of the normalized innovation sequence and implemented via a piecewise function for gradual adjustment. The normalized innovation sequence is defined as
where
is the original innovation sequence and
is its normalized form. A piecewise adaptive factor is designed based on the norm of the normalized innovation sequence:
This function facilitates progressive adjustment of the filtering gain: when the innovation sequence exhibits normal statistical characteristics, standard filtering performance is maintained; when minor model mismatch is detected, the weight of measurement information is reduced proportionally; and in the presence of severe anomalies, the filter relies entirely on predicted information, effectively mitigating the impact of abnormal observations on localization accuracy. The process noise covariance is updated adaptively as:
Similarly, the measurement noise covariance is adjusted via
3.4. The Proposed Gait-AUKF Algorithm for Localization
This paper proposes a gait phase-constrained adaptive unscented Kalman filter (UKF) [
27] localization algorithm that achieves high precision pedestrian localization by fusing INS data, GNSS measurements, and human gait characteristics [
28]. First, a 16-dimensional state space model is established, encompassing position, velocity, attitude, and sensor biases. Second, a gait phase detection method based on acceleration signals is designed to accurately identify the stance and swing phases. Then, an adaptive UKF framework is constructed, where process and measurement noise covariances are dynamically adjusted by monitoring innovation sequences. Finally, a phase dependent constraint weight adjustment mechanism is introduced to apply velocity update constraints during different gait phases. Experimental results demonstrate that the proposed algorithm effectively suppresses drift errors from IMU sensors and maintains high localization accuracy even during GNSS signal outages.
Figure 3 depicts the three-axis acceleration changes that transition between dynamic and static during gait.
The algorithm employs a 16-dimensional state vector to represent the pedestrian’s motion state, including the position vector , velocity vector , attitude quaternion , accelerometer bias , and gyroscope bias .
The measurement inputs include INS data, GNSS position data, and acceleration based gait phase detection. The GNSS observation model measurement equation is as follows:
Gait discrimination is divided into two types: standing phase and swinging phase. The zero velocity update (ZUPT) mechanism is activated when the standing phase is detected, constraining the current velocity state to a zero vector and effectively reducing velocity drift. The gait phase function is defined as
is the gait phase at time
t,
is the acceleration at time
, and
is the gait angular frequency. When the standing phase is detected, the zero velocity constraint is applied:
During the swing phase, use gait periodicity to establish a motion model related to forward speed or gait phase. The model is shown in the following formula. indicate the estimated velocity in the x-axis direction in the carrier coordinate system, used to select or combine specific components from three-dimensional velocity vectors, , and is gait related model parameters. is the rotation matrix from the navigation coordinate system to the carrier coordinate system. After incorporating the subsequent process into the PDR and AUKF algorithms for multi-source observation fusion, it is fed back to the motion mechanics to form a closed loop for pedestrian position localization.
In contrast to the conventional UKF and even the standard AUKF, the proposed Gait-AUKF algorithm introduces a gait phase constraint mechanism that integrates PDR to estimate pedestrian trajectories. This allows the system to maintain localization accuracy even during GNSS outages. By incorporating gait phase detection, the algorithm provides additional reference information for trajectory prediction. It adaptively adjusts the noise statistical model to accommodate the diversity of pedestrian motion patterns, effectively handling uncertainties in complex scenarios such as gait transitions and turning maneuvers, thereby significantly enhancing localization accuracy in challenging environments.
3.5. A Multi-Source Attention Framework for Pedestrian Trajectory Prediction and Planning
3.5.1. Framework Overview
This study proposes a hybrid framework for pedestrian trajectory prediction in complex dynamic environments, which integrates multi-source fusion attention mechanism, long short-term memory (LSTM) network, and A* path planning algorithm. This model is based on the front-end Gait-AUKF algorithm for real time estimation of pedestrian trajectory and gait information. By delving into the intrinsic correlation between leg movement features and advanced behavioral intentions, pedestrian trajectory prediction is achieved.
In terms of architecture design, this model fully utilizes the advantages of attention mechanism in capturing long-distance spatiotemporal dependencies, as well as the strengths of long short-term memory (LSTM) network in maintaining motion state memory. By analyzing the temporal evolution characteristics such as gait cycle and step size, the system can simulate the probability distribution of pedestrian intention. On this basis, an A* path planning algorithm for collaborative reasoning is introduced: the A* path planning algorithm guides the pedestrian direction provided by the front-end, performs heuristic search in the environmental map, and generates the optimal or suboptimal geometric path that conforms to physical constraints such as obstacles and road structures. The environmental map is a map data from the experimental area, obtained by vectorizing and annotating information such as road boundaries, pedestrian crossings, and fixed obstacles. The output of the prediction module will undergo dynamic collaborative evaluation with the planned path, ultimately generating a trajectory that combines behavioral authenticity and physical feasibility. The pedestrian trajectory prediction architecture and data flow diagram are shown in
Figure 4.
According to architecture
Figure 4, it can be seen that the data flow of the algorithm begins with multi-source perception data from wearable devices, which is received and processed by the Gait-AUKF module to generate a 16-dimensional state vector and gait feature vector, including three-dimensional position, three-dimensional velocity, quaternion pose, and sensor deviation estimation. Sensor error estimation is used for module adaptive parameter adjustment. The original 16-dimensional state undergoes dimension and format conversion as a time series input to the GRU encoder, where acceleration is calculated from velocity difference. The GRU encoder compresses the temporal feature sequence into a fixed length encoding vector
as the query state and sends it to the attention mechanism module. It interacts with the key value encoder in the historical memory to output a weighted context
. The LSTM decoder obtains the data stream and outputs the final predicted trajectory under the physical constraints of the A* path planner.
3.5.2. Gated Recurrent Unit Encoding Mechanism
The motion feature encoder is the temporal feature extraction module of this prediction model, which takes the real-time multi-dimensional pedestrian state sequence output by Gait-AUKF as input. Its core is a cyclic encoding network composed of gated recurrent units (GRUs). This encoder utilizes the gating mechanism of GRU to adaptively integrate the short-term dynamics and long-term trends of pedestrian trajectories, thereby constructing a hierarchical motion feature representation. The input data is the feature vector
for each time step, and GRU controls the flow of information through reset and update gates, as shown in
Figure 5 where
denotes the 2D planar coordinates;
and
represent the instantaneous velocity and acceleration vectors, respectively; and
corresponds to the gait feature vector.
Resetting the gate can reduce the influence of historical states when gait mutations are detected, making candidate states more dependent on the current input and responding to instantaneous changes. When the motion trend of the update gate is stable, it tends to retain most of the historical state, thereby maintaining the continuity of the motion direction and supporting long-term path modeling. The encoder outputs the hidden state of the last time step as the representation of the entire sequence. This vector integrates multi-level information from subtle gait adjustments to macroscopic motion trends, which will be used as input for subsequent multi-source fusion attention modules for further spatiotemporal correlation and future trajectory inference.
3.5.3. Design of the Attention-Based Addressing Mechanism
The memory enhanced attention addressing module is the core module for implementing historical experience retrieval in this framework. Its function is to establish semantic similarity mapping between the currently observed features and compressed historical memory features, thereby providing interpretable historical references for prediction. This module receives two input sources: first, the feature vector output by the motion feature encoder that fuses the current time period’s motion and intent, and second, the key value set constructed by compressing and encoding the features of all historical time steps. The key vector is used for similarity addressing, and the value vector stores the corresponding trajectory context information.
This module adopts a dual encoder architecture to map the target pedestrian’s historical trajectory set and pedestrian action features to the same semantic space and combines the spatiotemporal relationship between pedestrian action patterns and historical trajectories to predict the pedestrian’s future trajectory. The query encoder takes pedestrian motion feature vectors as input and establishes a nonlinear mapping based on pedestrian motion features. The key value encoder takes pedestrian trajectory data as input, learns the periodic motion patterns of pedestrians in the trajectory, and captures long-term temporal dependencies. Query encoder and key value encoder are multi-layer perceptrons with parameter sharing, where
,
and
are learnable parameters,
represents the similarity score. In the calculation, since both
and
are normalized, Formula (35) can be optimized as a dot product.
Normalize the similarity score to attention weights to achieve soft addressing of memory,
represents the attention weight of the
i memory.
The context vector
retrieved is a weighted aggregation of memory values.
The obtained vector integrates the historical experience most relevant to the current state. This vector will be concatenated with the output of the GRU encoder to form an enhanced fusion feature.
3.5.4. Collaboration Between LSTM Decoder Trajectory Prediction and A* Path Planning
This framework adopts a collaborative mechanism of LSTM decoder and A* path planning [
29] to generate trajectory predictions that conform to pedestrian behavior patterns while ensuring physical feasibility and intention orientation. The LSTM decoder [
30] takes the feature representation obtained in the encoding stage as the initial state input
, and gradually generates the future trajectory sequence through recursion. The schematic diagram of recursive calculation is shown in
Figure 6.
The A* path planning algorithm provides physically feasible path guidance for the LSTM decoder through heuristic search. The algorithm takes the current pedestrian position
as the starting point, determines the target point
based on orientation information
, and searches for the optimal path to avoid obstacles in the environment grid map. The core evaluation function balances the actual cost with heuristic estimation:
. Where
is the actual cumulative cost from the starting point to node
n, and
is the Euclidean distance heuristic function. The algorithm outputs a sequence of path points
as the intention guidance for pedestrians to move towards the target implicit in the path. The trajectory prediction
generated by the LSTM decoder is fused with the physically feasible path
planned by the A* path planning algorithm through a collaborative weighting mechanism as follows.
The final trajectory coordinate is obtained by adaptively fusing the coordinates predicted by the LSTM decoder, , and the coordinates planned by the A* path planning algorithm, , through weight fusion. The weights are calculated by the sigmoid function , is the variance based on prediction, and is the difference between prediction and planning.