UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction

Wang, Yiwei; Dong, Zengshou

doi:10.3390/electronics15030687

Open AccessArticle

UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction

by

Yiwei Wang

^1,2 and

Zengshou Dong

^1,2,*

¹

School of Electronic and Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China

²

Shanxi Province Industrial Digitization and Digital Asset Technology Innovation Center, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 687; https://doi.org/10.3390/electronics15030687 (registering DOI)

Submission received: 29 December 2025 / Revised: 30 January 2026 / Accepted: 3 February 2026 / Published: 5 February 2026

Download

Browse Figures

Versions Notes

Abstract

Non-line-of-sight (NLOS) propagation remains a major obstacle to high-accuracy ultra-wideband (UWB) indoor positioning. To address this issue, this study investigates solutions from two complementary perspectives: NLOS identification and error mitigation. First, an NLOS signal classification model is proposed based on multidimensional statistics of the channel impulse response (CIR). The model incorporates an attention mechanism and an improved snake optimization (ISO) algorithm, achieving significantly enhanced classification accuracy and robustness. For error mitigation, a UKF–BiLSTM dual-directional mutual calibration framework is proposed to dynamically compensate for NLOS errors. The framework embeds the constant turn rate and velocity (CTRV) motion model within an unscented Kalman filter (UKF) to enhance trajectory modeling. It establishes a bidirectional correction loop with a bidirectional long short-term memory (BiLSTM) network. Through the synergy of physical constraints and data-driven learning, the framework adaptively suppresses NLOS errors. Experimental results show that the proposed framework achieves state-of-the-art–comparable performance with improved model efficiency in complex indoor UWB positioning scenarios.

Keywords:

ultra-wideband (UWB); non-line-of-sight (NLOS); unscented Kalman filter (UKF); bidirectional long short-term memory (BiLSTM); indoor positioning

1. Introduction

In indoor environments, Global Positioning System (GPS) signals are often unavailable or unreliable. Consequently, a variety of indoor positioning technologies have been developed, including Wi-Fi [1], Bluetooth [2], ZigBee [3], and ultra-wideband (UWB) [4]. Among these technologies, UWB offers high data rates, low power consumption, and acceptable time-domain resolution, making it particularly suitable for high-precision indoor positioning. As a result, UWB has been widely applied in practical scenarios such as industrial monitoring [5] and drone positioning [6].

However, in practical indoor environments, UWB positioning systems are inevitably affected by non-line-of-sight (NLOS) propagation. When obstacles block the direct path between the transmitter and receiver, signals propagate through reflected or diffracted paths, leading to increased propagation time and positively biased ranging errors [7]. These NLOS-induced errors significantly degrade positioning accuracy and may even result in positioning failure in complex scenarios, thereby limiting the overall performance of UWB-based indoor positioning systems. Therefore, effective identification of NLOS conditions and suppression of their adverse effects are critical for improving the accuracy of UWB indoor positioning.

To address this challenge, recent studies have explored the integration of deep learning (DL) techniques with traditional filtering algorithms to mitigate NLOS errors. Tian et al. [8] proposed a KF–LSTM framework, in which Kalman filtering (KF) constrains physical-state estimation while a long short-term memory (LSTM) network models measurement errors, thereby demonstrating the feasibility of such hybrid approaches. Eang et al. [9] proposed a fusion framework combining a deep neural network (DNN) and an extended Kalman filter (EKF). In this framework, the DNN estimates ranging biases, and the EKF updates the state. Zhou et al. [10] further proposed a CNN–LSTM–DEKF method, in which a cascaded CNN–LSTM architecture is employed for NLOS identification, and the resulting classification information is incorporated into a distributed EKF (DEKF) to enable high-precision positioning under specific scenarios. To relax the limitations imposed by linear assumptions, Zhang et al. [11] designed a UKF–FNN–RIC framework. By integrating a feedforward neural network (FNN) with a redundant information correction (RIC) mechanism, the proposed framework enables progressive coupling with nonlinear physical models. As a result, millimeter- and centimeter-level accuracy can be achieved for static positioning and simple trajectory-tracking tasks. Overall, these studies represent essential advances in fusion-based NLOS mitigation; however, two fundamental limitations remain. First, KF/EKF/DEKF-based approaches rely on linearization assumptions, which make them difficult to generalize to complex dynamic scenarios with highly nonlinear motion characteristics. Second, while UKF-based frameworks alleviate linearity constraints, most adopt a unidirectional fusion paradigm. This approach lacks dynamic bidirectional feedback between physical models and DL components, which limits adaptability to complex, time-varying environments.

To overcome these limitations and achieve effective integration of physical constraints and data-driven learning, this paper proposes a unified framework that jointly addresses NLOS identification and NLOS error mitigation. The main contributions are summarized as follows:

(1): An enhanced classification model is proposed by integrating multi-head self-attention (MHSA) with an improved snake optimization (ISO) algorithm. The self-attention mechanism captures correlations among channel impulse response (CIR) features, while the optimization strategy adaptively searches for optimal network parameters, thereby improving classification accuracy and model robustness.
(2): A UKF–BiLSTM model with a bidirectional mutual calibration mechanism is proposed for NLOS error mitigation. The constant turn rate and velocity (CTRV) motion model is adopted to enhance the trajectory modeling capability of the UKF. Specifically, the UKF supplies physically constrained initial estimates to the BiLSTM. In turn, the BiLSTM leverages historical residuals to dynamically calibrate the UKF’s measurement noise statistics. This bidirectional interaction enables adaptive suppression of NLOS-induced errors under complex and time-varying conditions.

2. Related Work

2.1. Physics-Based Methods

In physical model-based UWB positioning research, filtering methods have long played a dominant role. Among these, the KF—one of the earliest dynamic estimation frameworks—assumes linear-Gaussian distributions and is therefore ill-suited for nonlinear ranging processes in complex indoor environments [12]. To address such nonlinearity, the EKF has been widely adopted by applying first-order linearization to nonlinear systems; however, such linearization inevitably introduces limitations, particularly for nonlinear ranging models in complex environments [13,14]. To overcome these drawbacks, the unscented Kalman filter (UKF) employs the unscented transformation (UT), which avoids explicit Jacobian calculations and often provides improved performance in nonlinear systems. In addition to these filtering methods, optimization-based enhancements have been developed to improve robustness. For example, Lyu et al. [15] combined an improved particle swarm optimization (IPSO) algorithm with an adaptive UKF that updates the noise covariance matrix in real time to mitigate NLOS errors. Likewise, Feng et al. [16] developed an EKF–UKF fusion framework that integrates improved ranging techniques with geometric dilution of precision (GDOP) optimization, enabling complementary updates from multiple information sources. In parallel, particle filtering (PF)—a representative nonparametric Bayesian estimation method—naturally handles nonlinear and non-Gaussian noise and has been widely applied to UWB indoor positioning [17]. As an example, Han et al. [18] proposed an adaptive ant colony optimization particle filter (AACOPF) for joint estimation of position and heading for a single mobile node, combining IMU/UWB fusion with dynamic weight adjustment. Despite these advances, most existing methods assume that auxiliary nodes are in motion; when such nodes are static, the system may become unobservable, thereby limiting practical applicability. In summary, filtering-based approaches offer strong interpretability and real-time performance, but their effectiveness depends heavily on predefined physical and noise models. In complex NLOS environments, modeling non-Gaussian and time-varying error characteristics remains challenging. Additionally, filtering-based methods often struggle to capture long-term temporal dependencies. These limitations have motivated increasing interest in data-driven approaches.

2.2. Deep Learning-Based NLOS Mitigation

Early studies primarily explored the application of CNNs and LSTM networks to UWB positioning, which evolved along two distinct technical pathways. For temporal modeling, Poulose et al. [19] employed a two-layer LSTM network to process time-of-arrival (TOA) ranging sequences. They achieved a mean positioning error of 7 cm in indoor line-of-sight (LOS) simulation scenarios. This result highlighted the potential of deep learning for modeling dynamic trajectories. However, the method is limited to ideal LOS conditions and requires a relatively long training time. From a feature representation perspective, Nguyen et al. [20] converted UWB signals into three-channel RGB images and employed CNNs for end-to-end localization. This approach achieved meter-level accuracy across multiple channel models. Nevertheless, it relies heavily on high signal-to-noise ratios, suffers from increased errors in suburban NLOS environments, and incurs high inference latency. In summary, these pioneering studies demonstrated the potential of temporal and feature-based learning for UWB positioning. They also revealed fundamental limitations of early DL approaches, notably their heavy reliance on simulated data and ideal operating conditions. To address these limitations, subsequent research has shifted toward network architectures that explicitly model the geometric structure inherent in localization tasks. He et al. [21] introduced a spatio-temporal graph neural network (STA-GNN) framework for UWB positioning. This framework explicitly models the geometric topological relationships between anchors and tags. In real indoor environments, their framework achieved positioning errors of 4.7–22.4 cm. By leveraging efficient graph convolution operations, it attained a real-time positioning frequency of 10 Hz on embedded platforms. Although the aforementioned CNN- and LSTM-based methods can achieve millisecond-level inference in simulation environments, STA-GNN is among the first to show that DL-based localization can satisfy system-level real-time requirements in complex real-world scenarios. This advancement marks a transition from algorithm-level validation in simulation toward practical system-level deployment. Building on the success of Transformer architectures in sequence modeling, researchers have applied attention mechanisms to improve both the accuracy and robustness of UWB positioning. Tang et al. [22] proposed a deep attention network that integrates a Transformer encoder with a GRU module, along with a geometric loss function to enforce sensor constraints. In real-world complex scenarios, this approach reduces ranging errors to approximately 0.1–0.2 m and achieves single-sample inference in just 0.004 s. This provides a favorable balance between accuracy and efficiency. However, its performance is highly scenario-specific, requiring retraining when anchor layouts change, and thus it suffers from limited transferability. To enhance environmental adaptability, subsequent efforts have focused on the tight integration of Transformers with domain knowledge. Yang et al. [23] further proposed the F-BERT framework, which deepens this integration by combining Transformers with fuzzy logic for fine-grained distance correction. The framework achieves centimeter-level positioning accuracy in dynamic NLOS scenarios. It experiences only a 6.56% performance drop across different environments, demonstrating strong adaptability and robustness. In summary, DL continues to push the boundaries of accuracy through architectural innovations. However, its purely data-driven nature presents two fundamental limitations. First, model predictions lack guidance from physical kinematic constraints, leading to non-physical outputs in complex dynamic trajectories. Second, performance heavily depends on the distribution of the training data, limiting cross-scenario generalization. Therefore, the tight integration of data-driven learning with physical model constraints has emerged as a critical pathway for further advances in performance and robustness.

2.3. Hybrid Model-Based NLOS Mitigation

To overcome the limitations of model-driven and data-driven approaches, hybrid paradigms that marry their strengths are now central to high-precision UWB positioning research. Existing studies can be broadly categorized by the degree of integration, revealing a progression from simple combinations to deeply collaborative frameworks. Early studies predominantly adopted unidirectional, open-loop fusion strategies. For instance, Tian et al. [8] employed a KF as a pre-processing stage for an LSTM network to denoise input measurements. This strategy improves accuracy in ideal LOS scenarios. However, due to a lack of interaction between the filtering module and the neural network, it exhibits limited adaptability in complex, dynamic NLOS environments. To further mitigate multi-source errors, subsequent research evolved toward multi-stage, cascaded fusion architectures. Zhang et al. [11] proposed a cascaded combination of a UKF, an FNN, and an RIC module, while Wang et al. [24] extended this framework by embedding the Chan algorithm between the FNN and RIC modules. This design stabilizes the conversion of time-difference-of-arrival (TDOA) measurements into absolute distances, mitigating the instability inherent to conventional implicit-function solutions. Such frameworks use multiple specialized modules to sequentially mitigate random and systematic errors during geometric optimization. This enables millimeter- to centimeter-level positioning accuracy in static and low-speed dynamic scenarios. Nonetheless, their performance remains highly dependent on predefined processing pipelines, and the complex serial structure often incurs significant computational overhead. More recently, research has shifted toward end-to-end, physically informed architectures that explicitly reflect the underlying physics of localization. A representative approach enhances key components of physical filters using DL networks. For example, Ren et al. [25] employed an attention-enhanced LSTM network to generate pseudo-observations for the KF under NLOS conditions, thereby incorporating data-driven information into the state estimation process. Another approach reconstructs the entire system using network architectures aligned with physical relationships. Muthineni et al. [26] adopted graph neural networks (GNNs) to model anchors and tags as nodes in a graph. Through message passing, multimodal data are naturally fused while geometric constraints among sensors are explicitly encoded. Nevertheless, the mechanisms for collaboration remain inherently constrained. Attention-based LSTM methods still act as unidirectional substitutes for physical filters. In contrast, end-to-end architectures such as GNNs do not integrate a concurrently operating physical state estimator to provide real-time physical constraints and corrective feedback.

In summary, existing hybrid methods have improved accuracy and robustness. However, they generally lack adaptive interaction mechanisms between physical and data-driven modules. This absence prevents the formation of a unified closed-loop optimization framework. To address this gap, this paper proposes a novel fusion framework for state estimation that enables real-time, closed-loop integration of physical constraints and data-driven learning.

3. NLOS Identification Model

3.1. Analysis of Parameters Related to NLOS Recognition

Feature extraction was performed on CIR waveforms for NLOS identification. Following the guidelines in the DW1000 User Manual [27], several commonly used CIR features were initially considered, including first-path amplitudes (FP_AMP1–FP_AMP3), noise-related statistics (maximum noise amplitude and noise standard deviation), and the CIR power (CIR_PWR). To reduce feature redundancy and retain the most informative inputs, a random forest (RF) model was employed to evaluate feature importance, and the ten most significant features were selected as the final input set. This procedure reduces the input dimensionality while preserving the discriminative characteristics required for subsequent NLOS classification.

Figure 1 compares CIR waveforms under LOS and NLOS conditions, zoomed into the sample interval from 720 to 820. Several key regions and feature points related to local time-domain characteristics are annotated:

(1): Direct path peak: under LOS conditions, a single sharp peak appears at approximately sample 760, with concentrated energy and rapid decay. This peak corresponds to the dominant direct path and represents a typical time-domain characteristic of LOS propagation.
(2): Multipath components: in NLOS conditions, multiple low-amplitude peaks are distributed across the 750–800 sample range, indicating dispersed energy caused by rich multipath propagation.
(3): First path (FP_AMP1): the first path is marked as the earliest significant peak in the LOS waveform, corresponding to the earliest arriving signal component. Its amplitude is therefore selected as an important discriminative feature.
(4): Excess delay spread: under NLOS conditions, the CIR waveform exhibits an extended trailing response within the 760–780 sample interval, indicating temporal dispersion caused by signal propagation through reflected and diffracted paths.
(5): Noise floor region: the pre-arrival interval from samples 720 to 740 is designated as the noise floor region, from which noise-related statistical features, such as maximum noise amplitude and noise standard deviation, are extracted.
(6): Total power feature (CIR_PWR): CIR_PWR integrates the entire CIR waveform to represent the overall channel energy.

These waveform differences arise from multipath effects caused by reflections and diffractions under NLOS propagation. They provide the physical basis for extracting features such as peak amplitudes, delay spread, multipath distribution, and noise-related statistics.

Figure 2 compares the distributions of FP_AMP1–FP_AMP3 under LOS and NLOS conditions. FP_AMP1 corresponds to the earliest significant peak (see Figure 1) and is typically higher in LOS scenarios, reflecting the strength of the direct path. FP_AMP2 and FP_AMP3 denote the subsequent two multipath peaks. Together with FP_AMP1, they form the early multipath sequence, which characterizes multipath propagation in NLOS environments. Figure 2a–c show that FP_AMP1–FP_AMP3 medians are higher under LOS than NLOS. Outliers in LOS (Figure 2b) may result from strong reflections. LOS paths show lower attenuation, and NLOS is affected by multipath. The LOS/NLOS median ratio quantifies early-path amplitude differences, reflecting direct-path strength and justifying its inclusion in the feature set.

Figure 3 shows two noise statistics extracted from the CIR noise floor: maximum noise amplitude and noise standard deviation. In Figure 3a, the median noise maximum under LOS is 1248, slightly higher than the 1084 under NLOS. The LOS distribution is wider, which may result from receiver saturation caused by the direct path signal. Figure 3b shows the opposite pattern. The median noise standard deviation under NLOS is 72, higher than the 60 under LOS. The NLOS distribution is wider with more outliers, indicating stronger temporal fluctuations caused by multipath interference. In contrast, the noise under LOS is more stable. These differences show that noise statistics can characterize fundamental differences in the channel environment.

Figure 4 shows the distribution of CIR total power (CIR_PWR) under LOS and NLOS conditions. CIR_PWR represents the time-domain integrated energy. The median under LOS is 10,984, higher than the 7018 under NLOS, with a ratio of 1.6. The LOS distribution is more concentrated. This indicates that signal power is higher and more stable under LOS, consistent with the amplitude features in Figure 1. It confirms that CIR_PWR effectively distinguishes channel conditions and is included in the feature set.

The above analysis of CIR waveform features provides a physical basis for feature selection. Based on this, 10 discriminative features are selected from the CIR data. These features reduce dimensionality while retaining key information and serve as optimized input for the model in Section 3.3.

3.2. Snake Optimizer

The snake optimizer (SO) is a novel metaheuristic optimization algorithm proposed by Fatma A. Hashim and Abdelazim G. Hussien in 2022 [28]. The algorithm simulates the foraging and reproductive behaviors of snakes in nature and formulates these behaviors into an optimization process through mathematical modeling, enabling the solution of diverse optimization problems. The core mechanism of the SO relies on a dynamic temperature threshold

T_{th}

(typically set to 0.6) to distinguish between two primary behavioral modes: global exploration (foraging) and local exploitation (mating). During initialization, a population of

N

snakes is randomly generated, where each individual represents a candidate solution. The population is then ranked according to fitness and evenly divided into a male group

M

and a female group

F

.

In each iteration, the environmental temperature

T

and the food quantity

Q

act as two key control variables. The temperature follows a simulated exponential cooling schedule, expressed as

T = \exp (- \frac{t}{t_{\max}})

, where

t

denotes the current iteration index and

t_{\max}

represents the maximum number of iterations.

The food quantity

Q

is defined as the ratio between the current best fitness value and the worst fitness value, serving as an indicator of food abundance within the search space:

Q = \frac{f_{best}}{f_{worst}}

(1)

where

f_{best}

and

f_{worst}

represent the best and worst fitness values within the current population, respectively.

When the temperature is low

(T \leq T_{th})

, the algorithm enters the exploration phase. This phase simulates the foraging behavior of a snake population, where movement strategies are further determined by the food availability

Q

.

When food is scarce (

Q < Q_{th}

, where

Q_{th}

is the food quantity threshold used to assess food availability and is typically set to 0.25), each snake individual

S_{i}

tends to perform a global random walk.

S_{i, new} = S_{rand} \pm r \cdot A

(2)

where

S_{i, new}

denotes the updated position of the snake individual,

S_{rand}

denotes the position vector of a randomly selected individual within the population,

r

is a random number uniformly distributed in the interval

[0, 1]

, and

A

is a predefined attack intensity constant that controls the step size of the random walk and is typically set to 0.5. When food is abundant, snakes move toward the known optimal food source position

S_{food}

(i.e., the position vector corresponding to the individual with the best fitness in the current population):

S_{i, new} = S_{food} \pm r \cdot ((S_{food} - S_{i}) \cdot T)

(3)

The factor

T

here serves to regulate the step size; the lower the temperature, the stronger the certainty of movement toward the food source.

When the environmental temperature is suitable

(T > T_{th})

, the algorithm switches to the exploitation phase, simulating mating and competitive behaviors among snakes. During this phase, male and female groups operate independently. Male individuals enter a combat mode, in which they move toward the current optimal male

S_{best, M}

(i.e., the position vector of the individual with the best fitness within the male group

M

), while simultaneously experiencing competitive interference from a randomly selected male

S_{rand, M}

(i.e., a randomly chosen position vector within the group

M

):

S_{i, new} = S_{i} \pm r \cdot (S_{best, M} - S_{rand, M})

(4)

Meanwhile, when food is abundant

(Q \geq Q_{th})

and the random mating condition is satisfied, a female individual

S_{female}

(a randomly selected position vector from the female group

F

) interacts with a selected male individual

S_{male}

(a randomly selected position vector from the male group

M

), producing an offspring position

S_{offspring}

, which is computed as the average of the two:

S_{offspring} = \frac{S_{male} + S_{female}}{2}

(5)

Newly generated offspring replace the worst individuals in the current population. This process maintains population size and introduces new search potential. After each position update, the algorithm checks boundaries and applies a greedy selection mechanism to retain superior solutions. These steps guide the population toward the global optimum. The dual regulation of temperature and food quantity controls exploration and exploitation behaviors. Specifically, varying temperature encourages the discovery of diverse solutions, while food quantity management intensifies the search around promising areas. Together, these mechanisms enable the snake optimizer to search efficiently and adaptively within the solution space.

3.3. ISO-MBP Model

To enhance the recognition accuracy of NLOS states in complex indoor UWB scenarios, this paper proposes an NLOS recognition model that integrates an ISO algorithm with a multi-head backpropagation network (MBP). The overall model consists of three stages: feature representation and initial classification based on BP–MHSA (stage A); an ISO incorporating iterative chaotic map with infinite collapses (ICMIC) chaotic mapping and differential evolution (DE) mechanisms (stage B); and global optimization of BP network parameters using ISO (stage C). An overview of the ISO-MBP framework is illustrated in Figure 5.

In stage A, the input feature vector of each sample is denoted as

a \in ℝ^{d}

, where

d = 10

represents the dimensionality of the input features. The BP neural network first applies a linear transformation followed by a nonlinear activation to the input features. The output of the first hidden layer is given by:

h_{1} = σ (W_{1} a + b_{1})

(6)

where

W_{1} \in ℝ^{H_{1} \times d}

and

b_{1} \in ℝ^{H_{1}}

denote the weight matrix and bias vector of the first hidden layer, respectively.

H_{1}

represents the number of neurons in the hidden layer, and

σ (\cdot)

denotes the ReLU activation function.

To enhance the model’s capability to capture correlations across different feature subspaces, an MHSA mechanism is introduced after the first hidden layer. This mechanism operates on the feature representation space rather than on temporal modeling. To improve feature propagation stability and accelerate convergence, residual connections and layer normalization are incorporated into the attention module. The hidden layer output

h_{1}

is treated as a single-step feature embedding, and its Query, Key, and Value vectors are defined as follows:

Q = h_{1} W_{Q}, K = h_{1} W_{K}, V = h_{1} W_{V}

(7)

among these,

W_{Q}

,

W_{K}

, and

W_{V}

are learnable projection matrices. Within a single attention head, the scaled dot-product attention is computed as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(8)

where

d_{k}

denotes the dimensionality of the key vector. Multi-head attention constructs

H

attention heads in parallel to model the input representation across different feature subspaces, yielding the following output:

h_{att} = {Concat (head}_{1}, \dots, {head}_{H}) W_{O}

(9)

where

W_{O}

denotes the output projection matrix.

Subsequently, the attention-enhanced feature representations are fed into the subsequent hidden layers, with the computation performed as follows:

h_{2} = σ (W_{2} h_{att} + b_{2})

(10)

where

W_{2}

and

b_{2}

denote the weight matrix and bias vector of the second hidden layer, respectively. Finally, the output layer produces the classification result for each sample:

o = W_{3} h_{2} + b_{3}

(11)

where

W_{3}

and

b_{3}

represent the weight matrix and bias vector of the output layer, respectively. During model training, the cross-entropy loss function

L (\cdot)

is employed as the objective function to quantify the discrepancy between the predicted results and the ground-truth labels.

In stage B, ICMIC chaotic mapping is employed for population initialization to alleviate uneven initial population distribution in high-dimensional search spaces. Meanwhile, a DE mechanism is incorporated during iteration to enhance global search capability. The ICMIC chaotic sequence is generated iteratively by the following equation:

z_{k + 1} = \sin (\frac{a π}{z_{k}}), a \in (0, 1)

(12)

where

z_{k}

denotes the chaotic value at the

k -th

iteration, and

a

is the chaotic parameter, which is set to 0.9 in this study.

For the

i -th

snake individual, a

d

-dimensional chaotic vector

z^{(i)} = [z_{1}^{(i)}, \dots, z_{d}^{(i)}]

is first generated, with each component independently produced according to Equation (12). The vector is then mapped to the solution space

[l, u]

:

S_{i}^{(0)} = l + z^{(i)} ⊙ (u - l)

(13)

where

S_{i}^{(0)}

denotes the initial position of the

i -th

snake individual;

l

and

u

represent the lower and upper bounds of the search space, respectively; and

⊙

denotes element-wise multiplication.

After each SO position update, a DE perturbation is applied.

(1): Mutation: a mutant vector is generated for the current individual $S_{i}$ by randomly selecting three distinct individuals $S_{r_{1}}$ , $S_{r_{2}}$ , $S_{r_{3}}$ :

$v_{i} = S_{r_{1}} + μ \cdot (S_{r_{2}} - S_{r_{3}})$

(14)

where $μ$ is the mutation factor, which is set to 0.3 in this study, and $r_{1} \neq r_{2} \neq r_{3} \neq i$ .
(2): Crossover: a trial vector $u_{i}$ is generated using binomial crossover:

$u_{i, j} = \{\begin{matrix} v_{i, j}, & rand \leq C R o r j = j_{rand} \\ S_{i, j}, & otherwise \end{matrix}$

(15)

where $C R$ denotes the crossover probability, which is set to 0.8 in this study; $j$ is the index of the current dimension ( $j = 1, 2, \dots, d$ ); $j_{rand}$ is a randomly selected dimension index that ensures at least one component is inherited from the mutant vector; and $rand$ denotes a uniformly distributed random number in the interval $[0, 1]$ .
(3): Selection: the fitness values of the trial vector $u_{i}$ and the current individual $S_{i}$ are compared, and the better one is retained.

$S_{i} = \{\begin{matrix} u_{i}, & f (u_{i}) < f (S_{i}) \\ S_{i}, & otherwise \end{matrix}$

(16)

The complete workflow of the ISO is as follows. The population is first initialized using chaotic mapping according to Equation (13). The SO position update described in Section 3.2 (Equations (2)–(5)) is then performed, followed by DE perturbation (Equations (14)–(16)). Finally, the global optimal solution

S^{*}

is output once the termination criterion is satisfied.

In stage C, all trainable parameters of the BP neural network are encoded into a one-dimensional vector. This vector defines the position

S

for all snake individuals in the ISO algorithm, serving as the optimization target.

S = [\begin{matrix} vec (W_{1}), b_{1}, vec (W_{2}), b_{2}, vec (W_{3}), b_{3} \end{matrix}]

(17)

where

vec (\cdot)

denotes the vectorization operation, which expands matrices column-wise into one-dimensional vectors. The ISO algorithm employs the classification loss on the BP neural network’s training set as the fitness function:

f (S_{i}) = L (S_{i})

(18)

where

f (\cdot)

denotes the fitness function during the optimization process, and

L (\cdot)

represents the cross-entropy loss function.

Through continuous updates of individual snakes within the search space, the global optimal parameters are ultimately obtained:

S^{*} = \arg \min f (S)

(19)

where

\arg \min

denotes the variable that minimizes the objective function.

S^{*}

is then decoded into the network’s initial parameters for gradient descent training. This fusion of global search and local gradient learning enhances both the performance and robustness of the NLOS identification model.

4. NLOS Error Mitigation Model

4.1. CTRV Model

The CTRV model is a widely adopted kinematic representation for maneuvering target tracking. Compared with the conventional constant velocity (CV) and constant acceleration (CA) models, the CTRV model more accurately captures the turning behaviors of moving targets, such as vehicles and pedestrians, during planar motion. The CTRV model represents planar motion by explicitly incorporating the heading angle and angular velocity, thereby decomposing the motion into linear and angular components. Accordingly, the CTRV state vector is defined as a five-dimensional vector:

x_{k} = {[x_{k}, y_{k}, v_{k}, ψ_{k}, ω_{k}]}^{T}

(20)

where

(x_{k}, y_{k})

denotes the target position,

v_{k}

represents the linear velocity magnitude,

ψ_{k}

corresponds to the heading angle measured with respect to the x-axis, and

ω_{k}

denotes the angular velocity (turn rate).

The state transition equations of the CTRV model are derived from kinematic principles. When the angular velocity

|ω_{k}| > ϵ

, where

ϵ

denotes a predefined small threshold (e.g.,

ϵ = 10^{- 6}

), the target is assumed to follow a circular trajectory:

\{\begin{matrix} x_{k + 1} = x_{k} + \frac{v_{k}}{ω_{k}} [\sin (ψ_{k} + ω_{k} Δ t) - \sin ψ_{k}] \\ y_{k + 1} = y_{k} + \frac{v_{k}}{ω_{k}} [- \cos (ψ_{k} + ω_{k} Δ t) + \cos ψ_{k}] \\ ψ_{k + 1} = ψ_{k} + ω_{k} Δ t \\ v_{k + 1} = v_{k} \\ ω_{k + 1} = ω_{k} \end{matrix}

(21)

When

|ω_{k}| < ϵ

, the model reduces to linear motion:

\{\begin{matrix} x_{k + 1} = x_{k} + v_{k} Δ t \cos ψ_{k} \\ y_{k + 1} = y_{k} + v_{k} Δ t \sin ψ_{k} \\ ψ_{k + 1} = ψ_{k} \\ v_{k + 1} = v_{k} \\ ω_{k + 1} = ω_{k} \end{matrix}

(22)

where

x_{k + 1}

,

y_{k + 1}

,

ψ_{k + 1}

,

v_{k + 1}

,

ω_{k + 1}

represent the predicted position, linear velocity, heading angle, and angular velocity at time step

k + 1

, respectively.

Δ t

denotes the sampling interval between two consecutive time instants.

The CTRV model provides a physically interpretable and computationally efficient representation of turning motions through a compact five-dimensional state. It can consistently represent both straight-line and turning motions, demonstrating strong adaptability across different motion patterns. Accordingly, the CTRV model is adopted as the fundamental motion model in this work to characterize target motion in UWB positioning scenarios accurately.

4.2. UKF

In nonlinear state estimation, the EKF linearizes the system by applying a first-order Taylor expansion. However, it can produce substantial errors when the system is highly nonlinear. The UKF uses the UT to handle nonlinearities directly, eliminating the need to compute Jacobian matrices. This approach provides improved estimation accuracy and enhanced numerical stability. The core principle of the UT is to choose a set of sigma points that accurately represent the mean and covariance of the input random variable. These points are then propagated through the nonlinear function, and the resulting output points are used to approximate the transformed random variable. In this work, the UKF is employed to propagate the CTRV-based motion states and to provide physically consistent state estimates, which are subsequently refined by the BiLSTM through temporal error correction.

Let the state vector

x_{k - 1}

at time

k - 1

have mean

{\bar{x}}_{k - 1}

and covariance matrix

P_{k - 1}

. The UT generates

2 n + 1

sigma points according to the following procedure, where

n

denotes the state dimension.

X_{k - 1}^{(0)} = {\bar{x}}_{k - 1}

(23)

X_{k - 1}^{(i)} = {\bar{x}}_{k - 1} + {(\sqrt{(n + λ) P_{k - 1}})}_{i}, i = 1, \dots, n

(24)

X_{k - 1}^{(i)} = {\bar{x}}_{k - 1} - {(\sqrt{(n + λ) P_{k - 1}})}_{i - n}, i = n + 1, \dots, 2 n

(25)

where

{(\sqrt{P})}_{i}

denotes the

i -th

column of the matrix square root. The corresponding weight for each sigma point is:

W_{m}^{(0)} = \frac{λ}{n + λ}

(26)

W_{c}^{(0)} = \frac{λ}{n + λ} + (1 - α^{2} + β)

(27)

W_{m}^{(i)} = W_{c}^{(i)} = \frac{1}{2 (n + λ)}, i = 1, \dots, 2 n

(28)

where

W_{m}^{(i)}

represents the weight for the mean, and

W_{c}^{(i)}

denotes the weight for the covariance of the sigma points, with the superscript

i

indicating the index of each sample point.

λ = α^{2} (n + κ) - n

is the scaling parameter, and

α \in (0, 1]

controls the spread of the sigma points to ensure that the nonlinear function is accurately approximated within the local region.

κ

is an auxiliary scaling parameter related to the state dimension, typically set to 0.

β

is used to incorporate prior statistical information of the state distribution into the covariance calculation, specifically by adjusting the covariance weight of the central sigma point. When the state variables are approximately Gaussian, setting

β = 2

effectively compensates for higher-order statistical errors. This common choice is adopted in this work.

The UKF algorithm proceeds as follows:

(1): Initialization: initialize the state ${\hat{x}}_{0 | 0}$ and the corresponding covariance matrix $P_{0 | 0}$ .
(2): Prediction step: according to Equations (23)–(25), a set of sigma points $X_{k - 1}^{(i)}$ is generated and propagated through the state transition function $f (\cdot)$ to obtain the predicted sigma points $X_{k | k - 1}^{(i)} = f (X_{k - 1}^{(i)}, Δ t)$ . The predicted state mean and covariance are then computed as follows:

${\hat{x}}_{k | k - 1} = \sum_{i = 0}^{2 n} W_{m}^{(i)} X_{k | k - 1}^{(i)}$

(29)

$P_{k | k - 1} = \sum_{i = 0}^{2 n} W_{c}^{(i)} (X_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1}) {(X_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1})}^{T} + Q$

(30)

where $Q$ denotes the process noise covariance matrix.
(3): Update step: regenerate the sigma points and compute the predicted observations:

$Z_{k | k - 1}^{(i)} = h (X_{k | k - 1}^{(i)})$

(31)

The predicted observation mean, covariance, and cross-covariance are then calculated as follows:

{\hat{z}}_{k | k - 1} = \sum_{i = 0}^{2 n} W_{m}^{(i)} Z_{k | k - 1}^{(i)}

(32)

P_{z z} = \sum_{i = 0}^{2 n} W_{c}^{(i)} (Z_{k | k - 1}^{(i)} - {\hat{z}}_{k | k - 1}) {(Z_{k | k - 1}^{(i)} - {\hat{z}}_{k | k - 1})}^{T} + R

(33)

P_{x z} = \sum_{i = 0}^{2 n} W_{c}^{(i)} (X_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1}) {(Z_{k | k - 1}^{(i)} - {\hat{z}}_{k | k - 1})}^{T}

(34)

where

R

is the observation noise covariance matrix.

The Kalman gain is calculated, and the state estimate and covariance are updated based on the observation residual as follows:

K_{k} = P_{x z} {P_{z z}}^{- 1}

(35)

{\hat{x}}_{k | k} = {\hat{x}}_{k | k - 1} + K_{k} (z_{k} - {\hat{z}}_{k | k - 1})

(36)

P_{k | k} = P_{k | k - 1} - K_{k} P_{z z} {K_{k}}^{T}

(37)

where

z_{k}

denotes the observation vector at time

k

.

The UKF employs the UT to directly handle nonlinear functions, thereby avoiding the linearization errors inherent in the EKF. This property makes it particularly suitable for state estimation in strongly nonlinear motion models such as CTRV. In Section 4.4, the CTRV model is integrated with the UKF to construct a UKF–BiLSTM bidirectional mutual calibration localization framework.

4.3. BiLSTM

LSTM addresses the vanishing gradient problem in traditional RNNs via its gating mechanism and is therefore well-suited for modeling long sequences. To further enhance temporal feature extraction, a BiLSTM architecture is employed, as shown in Figure 6. The figure illustrates the bidirectional flow of information between the input layer, the forward and backward LSTM layers, and the output layer.

(1): Input layer: the layer is designed to receive input feature vectors $X_{k - 1}$ , $X_{k}$ , and $X_{k + 1} \in ℝ^{d_{i n}}$ at time steps $k - 1$ , $k$ and $k + 1$ , where $d_{i n}$ denotes the total input dimension. At each time step, the input integrates multi-source information:

$X_{k} = [r_{k}, x_{k}^{UKF}, p_{k - 1}^{LSTM}]$

(38)

where $r_{k} \in ℝ^{N_{a}}$ is the vector of $N_{a}$ UWB ranging values, $x_{k}^{UKF} \in ℝ^{5}$ is the UKF state vector, and $p_{k - 1}^{LSTM} \in ℝ^{2}$ is the LSTM position estimate vector from the previous time step.
(2): Forward LSTM layer: the layer processes the forward time sequence $(k - 1 \to k \to k + 1)$ and computes the forward hidden states:

${\vec{h}}_{k} = LSTM (X_{k}, {\vec{h}}_{k - 1})$

(39)
(3): Backward LSTM layer: the layer processes the backward time sequence $(k + 1 \to k \to k - 1)$ and computes the backward hidden states:

${\overset{\leftarrow}{h}}_{k} = LSTM (X_{k}, {\overset{\leftarrow}{h}}_{k + 1})$

(40)
(4): Output layer: the forward and backward hidden states are concatenated and passed through a fully connected layer to produce the final output:

$Y_{k} = g ([{\vec{h}}_{k}; {\overset{\leftarrow}{h}}_{k}])$

(41)

where $g (\cdot)$ is a linear projection layer.

Compared to unidirectional LSTMs, BiLSTMs model the temporal context more comprehensively by leveraging both past and future information. This allows BiLSTMs to capture temporally dependent error patterns more effectively and produce smoother, continuous predictions.

4.4. UKF-BiLSTM Bidirectional Mutual Correction Model

To address the challenges posed by nonlinear motion and complex environmental interference in UWB positioning, a UKF–BiLSTM bidirectional mutual calibration model is proposed. The proposed framework integrates filtering and learning in a tightly coupled manner through three stages: initial positioning using a CTRV-based UKF (stage A), temporal error correction with a BiLSTM (stage B), and a bidirectional mutual calibration closed-loop (stage C). An overview of the UKF–BiLSTM framework is illustrated in Figure 7.

Stage A adopts the CTRV motion model as the state transition function

f (x_{k}, Δ t)

to obtain an initial estimate of the target state. In the following UKF prediction equations, the sigma points

X_{k - 1}^{(i)}

at time

k - 1

are propagated to predict the state

X_{k | k - 1}^{(i)}

at time

k

.

(1): Turning motion $(| ω_{k - 1}^{(i)} | \geq ϵ)$

X_{k | k - 1}^{(i)} = [\begin{matrix} x_{k - 1}^{(i)} + \frac{v_{k - 1}^{(i)}}{ω_{k - 1}^{(i)}} [\sin (ψ_{k - 1}^{(i)} + ω_{k - 1}^{(i)} Δ t) - \sin ψ_{k - 1}^{(i)}] \\ y_{k - 1}^{(i)} + \frac{v_{k - 1}^{(i)}}{ω_{k - 1}^{(i)}} [- \cos (ψ_{k - 1}^{(i)} + ω_{k - 1}^{(i)} Δ t) + \cos ψ_{k - 1}^{(i)}] \\ v_{k - 1}^{(i)} \\ ψ_{k - 1}^{(i)} + ω_{k - 1}^{(i)} Δ t \\ ω_{k - 1}^{(i)} \end{matrix}]

(42)

(2): Straight-line motion $(| ω_{k - 1}^{(i)} | < ϵ)$

X_{k | k - 1}^{(i)} = [\begin{matrix} x_{k - 1}^{(i)} + v_{k - 1}^{(i)} Δ t \cos ψ_{k - 1}^{(i)} \\ y_{k - 1}^{(i)} + v_{k - 1}^{(i)} Δ t \sin ψ_{k - 1}^{(i)} \\ v_{k - 1}^{(i)} \\ ψ_{k - 1}^{(i)} \\ ω_{k - 1}^{(i)} \end{matrix}]

(43)

where the superscript

(i)

indicates that all state variables belong to the

i -th

sigma point at time

k - 1

.

ϵ = 10^{- 6}

denotes the threshold used to distinguish between straight-line and turning motion.

For each sigma point

X_{k | k - 1}^{(i)}

, the measurement function

h (x)

maps the state to

M

UWB range measurements, where the state variable corresponds to the predicted value

x_{k | k - 1}

(denoted by

x^{(i)}

for brevity). For the

j -th

anchor located at

(a_{x_{j}}, a_{y_{j}})

:

h_{j} (x^{(i)}) = \sqrt{{(x^{(i)} - a_{x_{j}})}^{2} + {(y^{(i)} - a_{y_{j}})}^{2}}, j = 1, \dots, M

(44)

The process noise covariance matrix

Q

is designed to characterize the uncertainty associated with each state component under the CTRV model:

Q = diag (σ_{x}^{2}, σ_{y}^{2}, σ_{v}^{2}, σ_{ψ}^{2}, σ_{ω}^{2})

(45)

where

diag (\cdot)

denotes a diagonal matrix whose diagonal elements are given by the enclosed terms.

σ_{x}^{2}

and

σ_{y}^{2}

represent the uncertainties in positional coordinates, corresponding to positioning errors;

σ_{v}^{2}

denotes the noise variance of linear velocity, characterizing uncertainty in velocity variations;

σ_{ψ}^{2}

represents the noise variance of the heading angle, accounting for direction estimation errors; and

σ_{ω}^{2}

denotes the noise variance of angular velocity, characterizing variations in the turning rate.

The measurement noise covariance matrix is:

R = σ_{r}^{2} I_{M}

(46)

where

σ_{r} = 0.25 m

denotes the standard deviation of the UWB ranging measurement error, and

I_{M}

represents the

M \times M

identity matrix.

In Phase A, the complete UKF state sequence is obtained:

X^{UKF} = {x_{k}^{UKF}}_{k = 1}^{N}, x_{k}^{UKF} \in ℝ^{5}

(47)

This sequence includes position estimates

(x_{k}, y_{k})

and motion states

(v_{k}, ψ_{k}, ω_{k})

, thereby providing multidimensional input features for the BiLSTM corrector in Phase B.

Phase B employs a BiLSTM network to capture the temporal patterns of UKF estimation errors, thereby enabling adaptive error correction. At each time step, the input vector integrates three types of information:

M

UWB ranging measurements

r_{k} \in ℝ^{M}

, the full UKF state vector

x_{k}^{UKF} \in ℝ^{5}

, and the previous position estimate

p_{k - 1}^{LSTM} \in ℝ^{2}

generated by the BiLSTM. Accordingly, the overall input dimensionality is

d_{i n} = M + 5 + 2

. A sliding window with a length of

L_{w} = 10

is adopted to construct the sequential input samples.

The BiLSTM architecture consists of two stacked bidirectional LSTM layers, followed by an output layer that maps the bidirectional hidden states to the position error:

{\hat{e}}_{k} = W_{o} [{\vec{h}}_{k}; {\overset{\leftarrow}{h}}_{k}] + b_{o} \in ℝ^{2}

(48)

where

W_{o}

denotes the weight matrix of the output layer,

b_{o}

represents the corresponding bias vector, and

{\vec{h}}_{k}

and

{\overset{\leftarrow}{h}}_{k}

denote the forward and backward hidden states of the BiLSTM, respectively.

The error predicted by the BiLSTM is subsequently incorporated into the UKF estimation process:

p_{k}^{corr} = p_{k}^{UKF} + {\hat{e}}_{k}

(49)

where

p_{k}^{UKF}

represents the position coordinates estimated by the UKF, while

p_{k}^{corr}

denotes the corrected position.

Phase C establishes a dynamic feedback mechanism between the UKF and the BiLSTM to enable adaptive parameter adjustment. The residual between the UKF estimate and the BiLSTM-corrected result is computed as follows:

Δ_{k} = p_{k}^{corr} - p_{k}^{UKF}

(50)

Meanwhile, the trajectory curvature is calculated to characterize the motion complexity:

κ_{k} = \frac{| (p_{k}^{(x)} - p_{k - 1}^{(x)}) (p_{k + 1}^{(y)} - p_{k}^{(y)}) - (p_{k}^{(y)} - p_{k - 1}^{(y)}) (p_{k + 1}^{(x)} - p_{k}^{(x)}) |}{‖ p_{k} - p_{k - 1} ‖ \cdot ‖ p_{k + 1} - p_{k} ‖ + ϵ}

(51)

where

p_{k}^{(x)}

and

p_{k}^{(y)}

denote the

x -

and

y -

components of the position vector

p_{k}

, denoted by

p_{k} = {[p_{k}^{(x)}, p_{k}^{(y)}]}^{T}

.

p_{k - 1} = {[p_{k - 1}^{(x)}, p_{k - 1}^{(y)}]}^{T}

represents the UKF posterior position estimate at time

k - 1

, while

p_{k + 1} = {[p_{k + 1}^{(x)}, p_{k + 1}^{(y)}]}^{T}

denotes the UKF prior position prediction at time

k + 1

. The vectors

p_{k}

,

p_{k - 1}

and

p_{k + 1}

correspond to the UKF position estimates at the current, previous, and subsequent time steps, respectively. The numerator corresponds to the pseudo-scalar product of adjacent vectors and reflects the magnitude of trajectory direction changes. The denominator is the product of the norms of the adjacent vectors, which normalizes the curvature calculation.

Based on both the residual and the curvature, the UKF noise parameters are dynamically adjusted. The process noise adjustment is defined as follows:

Q_{k} = Q_{base} \cdot (1 + η_{q} \cdot ‖ Δ_{k} ‖ \cdot κ_{k})

(52)

The measurement noise is adjusted as follows:

R_{t} = R_{base} \cdot \max (0.5, 1 - η_{r} \cdot ‖ Δ_{t} ‖)

(53)

where

η_{q} = 0.5

and

η_{r} = 0.3

denote the learning rates, and

Q_{base}

and

R_{base}

represent the reference noise matrices.

| | \cdot | |

represents the L2 norm of a vector, that is, the Euclidean distance.

An adaptive threshold is introduced to process the observation residuals, thereby enhancing the system’s robustness. The threshold is adaptively adjusted according to the curvature variation:

τ_{k} = τ_{\max} - (τ_{\max} - τ_{\min}) \cdot \min (1, γ \cdot κ_{k})

(54)

where

τ_{\max} = 3.0 m

,

τ_{\min} = 1.0 m

and

γ = 1.5

denote the curvature sensitivity coefficients. The observation residuals are truncated by a threshold and subsequently used for UKF updates.

Training samples are generated using a sliding window, with 80% of the data assigned for training and 20% for validation. Each sample comprises

L_{w}

fused features from consecutive time steps. BiLSTM training is performed using a mean squared error loss

L

and the Adam optimizer, with an initial learning rate of

10^{- 3}

and a weight decay of

10^{- 5}

.

L = \frac{1}{N} \sum_{n = 1}^{N} ‖ p_{n}^{true} - p_{n}^{corr} ‖^{2}

(55)

where

n

is the index of a training sample, and

N

is the total number of samples.

p_{n}^{true}

denotes the actual position vector and

p_{n}^{corr}

denotes the corrected position vector. The entire model is optimized iteratively for up to five iterations. Each iteration encompasses the complete phases A–C, using the root mean square error (RMSE) on the validation set as the early stopping criterion.

The UKF-BiLSTM model operates in three stages: the CTRV-UKF provides initial estimates based on the physical model; the BiLSTM learns temporal error patterns for intelligent correction; and the bidirectional closed loop enables dynamic coordination between filtering and learning. This design fully exploits the complementary strengths of model-driven and data-driven approaches, improving adaptive capability while maintaining physical interpretability.

5. Experimental Analysis

5.1. Dataset Description

A.: NLOS recognition dataset.

The NLOS recognition experiment utilizes an open-source dataset [29], which contains LOS and NLOS signal measurements from seven indoor environments. From each environment, 3000 LOS and 3000 NLOS samples were collected, resulting in a total of 42,000 samples (21,000 per category). To rigorously evaluate the model’s generalization to unseen environments and prevent data leakage, we adopted a strict environment-level hold-out strategy. Data from six environments were used for training, while data from the remaining environment (Office 2) were held out exclusively for testing. This ensures the test environment is physically distinct from all training environments. Within the training set, samples were randomly permuted to prevent overfitting to specific spatial configurations. All preprocessing steps were fitted solely on the training data before being applied to the test set, further preventing information leakage.

B.: NLOS error mitigation dataset.

The NLOS error mitigation experiment uses an open-source indoor UWB positioning and tracking dataset released under a Creative Commons Attribution 4.0 International (CC BY 4.0) license by Klemen Bregar in 2023 [30]. Data were collected in four indoor environments: two residential homes, one industrial workshop, and one office. Along each predefined and uniformly spaced sampling path, pedestrian motion was emulated using 80–85 measurement points. In Environment 2, anchor A6 detached from its wall mount, resulting in deviations between the measured ranges and the true Euclidean distances. Consequently, all invalid measurements associated with anchor A6 at 11 sampling positions in this environment were excluded from the analysis.

Table 1 summarizes the characteristics of the four environments. Given the distinct signal propagation characteristics across environments, different preprocessing strategies were applied. Environment 0 and Environment 1 exhibit relatively mild interference, so only basic bias compensation was used. In contrast, Environment 2 and Environment 3 required more extensive preprocessing due to severe interference and pronounced NLOS effects.

5.2. NLOS Identification Experiments

5.2.1. Configuration of Baseline Neural Network Models

This study compares two baseline neural network models: BP and MBP. Both use the exact network dimensions and training configurations to ensure a fair comparison, as shown in Table 2. BP uses a fully connected structure. MBP, by adding an MHSA mechanism after the first hidden layer, along with residual connections and layer normalization, improves training stability and captures complex feature dependencies more effectively than BP.

5.2.2. Parameter Settings of Intelligent Optimization Algorithms

Table 3 compares parameter configurations for PSO and ISO. Both used identical population sizes, iteration numbers, parameter search ranges, and ICMIC chaotic initialization, as well as DE strategies. The main difference is in their optimization mechanisms. PSO updates particle positions through a velocity-update rule that balances inertia, cognitive learning, and social influence. In contrast, SO mimics snake foraging behavior, employing temperature-driven exploration decay and gender-based grouping to diversify search patterns through simulated mating and competition.

5.2.3. NLOS Classification Results

Figure 8 presents the confusion matrices of six models—BP, MBP, RF, IPSO-MBP, and ISO-MBP—with NLOS taken as the positive class. As shown, other models exhibit various limitations in NLOS detection accuracy, while the proposed ISO-MBP demonstrates the most pronounced optimization effect.

Table 4 summarizes the performance of six models on the NLOS recognition task. The baseline BP model yields relatively low performance, achieving 86.12% accuracy and an 86.13% F1-score, which reflects its limited ability to capture complex NLOS characteristics. Introducing MBP improves the accuracy and F1-score to 88.75% and 88.94%, respectively, indicating a more effective modeling of UWB feature correlations. However, model parameters and training time increase by nearly an order of magnitude, highlighting a trade-off between performance and computational cost. While maintaining the same parameter size as BP (13.1 K), the ISO-BP model increases accuracy to 89.03% and achieves an F1-score of 88.91%. Moreover, the training time is reduced by 11.2% compared to BP, decreasing from 463 s to 411 s. These results demonstrate that ISO effectively optimizes the initial weights of conventional BP networks, achieving improved recognition performance with reduced training cost. Despite their advantages in parameter efficiency and training speed, traditional methods like RF suffer from higher inference latency and, most importantly, inferior recognition performance compared to MBP and its optimized versions. This gap indicates their limited adaptability to complex NLOS conditions. Intelligent optimization further enhances performance: IPSO-MBP improves accuracy and F1-score over MBP but at the cost of increased training time, reflecting higher computational overhead. ISO-MBP achieves the best results, reaching 91.62% accuracy, 90.92% recall, and 91.67% F1-score. Additionally, while achieving superior recognition performance, ISO-MBP reduces training time by 18.6% compared with IPSO-MBP. This advantage stems from the SO mechanism, which dynamically regulates the search process through temperature decay and gender-based grouping. As a result, a better balance between global exploration and local exploitation is achieved, improving convergence efficiency without compromising optimization accuracy. Overall, ISO-MBP delivers superior performance by balancing recognition accuracy and computational efficiency.

To assess its overall performance, ISO-MBP is compared with representative DL-based methods. Compared with FCN-Attention [31], ISO-MBP achieves higher accuracy (91.62% vs. 88.24%), precision (92.46% vs. 85.85%), and F1-score (91.67% vs. 88.62%). Although its recall is slightly lower (90.92% vs. 91.56%), the higher F1-score indicates a more balanced overall performance. In addition, ISO-MBP is substantially more efficient, with only 103.7 K parameters (9.4% of FCN-Attention) and lower inference latency (0.16 ms vs. 0.189 ms). Compared with DMOCF [32], ISO-MBP achieves slightly lower recognition metrics but still maintains performance above 91%. In contrast, ISO-MBP is substantially more efficient, with only about 0.69% of the parameters of DMOCF and a low inference latency of 0.16 ms, making it well-suited for resource-constrained and real-time applications. ISO-MBP achieves the best performance among the internal models. Compared with recent DL methods, it maintains high accuracy while reducing model complexity and inference time.

5.3. NLOS Error Correction Experiments

5.3.1. Unified Configuration of Comparative Models

A unified experimental setup, detailed in Table 5, is used to compare the three positioning methods. Specifically, the number of UWB anchors configured in each environment determines the ranging feature dimensionality: Env0, Env1, and Env3 each have 8 anchors, whereas Env2 has 7. For all setups, the UKF state vector, based on the CTRV motion model, includes five variables: position, linear velocity, heading angle, and heading angular velocity. Additionally, historical position features refer to the model’s past position outputs, highlighting how prior data inform current evaluations.

5.3.2. Core Model Parameter Settings

Table 6 lists the core BiLSTM hyperparameters and complete UKF filter settings. All parameters were optimized independently for each of the four experimental environments.

5.3.3. Baseline Performance Evaluation

Figure 9a–d show RMSE distributions for different anchors across four environments (Env0–Env3). These figures illustrate the performance stability of each method under varying spatial layouts. For quantitative analysis, Table 7, Table 8, Table 9 and Table 10 report the RMSE values for each anchor across all four environments.

In the relatively simple residential environments (Env0/1), overall interference is low, and localization errors mainly arise from system noise and minor multipath effects. The Chan–Taylor method performs better in the compact, concrete-walled Env1 (RMSE 0.21–0.32 m) but shows larger errors in the larger Env0 with more interior walls (0.27–0.65 m). CNN-LSTM is stable in Env1 (0.13–0.18 m) but fluctuates significantly in Env0, with errors exceeding 0.77 m at points A2 and A5. BiLSTM shows uniform performance in Env1 (0.12–0.21 m) but exhibits notable error peaks in Env0 at points such as A3 and A5 (e.g., 0.4202 m at A5). Traditional UKF is stable in both environments, but its accuracy is higher in Env1 (RMSE 0.13–0.18 m) than in Env0 (0.19–0.24 m). The proposed UKF-BiLSTM fusion model achieves the most consistent and accurate results, maintaining RMSE within 0.11–0.18 m in Env0 and 0.06–0.11 m in Env1. Differences in performance among the methods in low-interference environments show their adaptability to environmental physical features. The accuracy of the Chan–Taylor method depends on the physical characteristics of the environment. In Env1, the compact space and concrete walls reduce multipath effects, making the environment closer to the assumptions of the Chan–Taylor model and resulting in a more concentrated error distribution. In Env0, the larger space with mostly gypsum walls leads to additional weak reflections that are not modeled by the Chan–Taylor method. These unmodeled paths lead to error accumulation during iterative estimation. DL models depend on the regularity of training data. The uniform physical structure in Env1 provides regular training data, allowing CNN-LSTM and BiLSTM to perform well. However, the more complex layout of Env0 introduces unstructured noise and occlusion variations. Facing unstructured noise, CNN-LSTM struggles to extract stable CIR spatial features, resulting in larger fluctuations in localization output. During training, BiLSTM learns sparse anomalies, such as minor occlusions at point A5, as temporal patterns, causing notable error peaks. The conventional UKF is robust in low-interference environments, but its fixed noise parameters cannot adapt online to local disturbances in Env0, which limits its achievable accuracy. The proposed UKF-BiLSTM addresses this limitation through a cooperative mechanism. The UKF provides physical constraints that reduce the risk of overfitting in the BiLSTM. Meanwhile, the BiLSTM analyzes UKF residuals to dynamically compensate for systematic biases not captured by the fixed parameters. This mechanism enables the fused model to achieve improved accuracy even in environments with mild interference.

In the industrial workshop environment (Env2), dense metal equipment causes strong specular reflections, resulting in numerous stable NLOS paths. Under these conditions, the Chan–Taylor geometric method degrades substantially, with RMSE values generally exceeding 0.75 m. The CNN-LSTM and BiLSTM models show comparable performance, with RMSE ranges of 0.21–0.37 m and 0.21–0.26 m, respectively. The conventional UKF remains stable at most locations, but the error increases to 0.4442 m at anchor A7, where strong reflections are present. In contrast, the proposed UKF-BiLSTM maintains consistent performance across all anchors, with RMSE values within 0.16–0.20 m. These performance differences can be attributed to how each algorithm interacts with the characteristics of the workshop environment, where reflection strength varies across anchor locations. The Chan–Taylor method degrades due to strong multipath effects. Its geometric formulation assumes that range measurements approximate true geometric distances. In industrial environments, specular reflections from metallic equipment distort the first path through multipath superposition, resulting in systematic positive bias and spatial correlation in the measurements. Such non-Gaussian errors violate the least-squares assumption and result in unstable solutions. Among the DL models, CNN-LSTM uses convolutional layers to extract local delay features from CIR waveforms that arise from reflections. BiLSTM exploits historical sequences to learn the temporal evolution of systematic biases caused by persistent reflection paths. Both models can capture such structured interference patterns from data, resulting in similar performance levels. The limitation of the conventional UKF lies in its predefined noise covariance matrix, which does not reflect spatial variations in interference strength. At anchor A7, where reflections are severe, the globally fixed observation noise underestimates the actual uncertainty, causing the filter to over-trust contaminated measurements. In this environment, the UKF-BiLSTM framework addresses anchor-dependent reflection variations by using BiLSTM to analyze UKF residuals and adjust the observation noise dynamically. This mechanism stabilizes filtering at highly disturbed anchors and provides feedback for subsequent iterations.

In the office environment (Env3), signal propagation is affected by wall penetration loss and indoor scattering. The Chan–Taylor method cannot compensate for nonlinear attenuation, resulting in RMSE values of 0.34–0.50 m. The BiLSTM shows large fluctuations, with errors exceeding 1.0 m at multiple anchors, whereas CNN-LSTM achieves RMSE values of 0.18–0.29 m. The overall accuracy of the conventional UKF is limited, with RMSE values ranging from approximately 0.34 to 0.51 m. The proposed UKF-BiLSTM achieves a clear improvement, reducing RMSE at all anchors to 0.08–0.12 m. These differences indicate varying abilities of the methods to handle deterministic system bias and unstructured random interference. Penetration loss acts as a global systematic bias, whereas scattering caused by complex layouts is local and random. The Chan–Taylor method is based on LOS propagation and simple path-loss assumptions. Nonlinear wall penetration is inconsistent with the model, and indoor scattering violates the LOS condition, leading to systematic estimation bias. The robustness of CNN-LSTM stems from its convolutional layers extracting delay features of the strongest paths from the CIR. These features are insensitive to amplitude variations and are therefore robust to penetration loss. In contrast, BiLSTM, as a purely temporal model, relies on the assumption of strong regularity in historical sequences. Errors introduced by random scattering in office environments lack stable temporal patterns, leading to overfitting during training and performance degradation at test time. The UKF-BiLSTM model jointly addresses random interference and systematic bias: UKF provides a physically constrained baseline to suppress random scattering, while BiLSTM analyzes UKF residuals to compensate for penetration-induced systematic errors. This complementary design enables the system to suppress random noise while mitigating systematic errors.

Experiments show that UWB positioning errors mainly arise from environment-related systematic biases and spatial heterogeneity, rather than random noise. The proposed UKF-BiLSTM framework combines the physical constraints of UKF with BiLSTM-driven residual correction to suppress systematic errors and random disturbances. Results indicate that the method maintains stable and accurate positioning in complex environments with multipath, penetration loss, and random scattering.

Figure 10 compares the positioning error distributions of five algorithms across various indoor environments using cumulative distribution functions (CDFs). In Figure 10a, the CDF curves in Env0 differ markedly between algorithms, showing their varying abilities to suppress long-tailed localization errors. The Chan–Taylor method has the flattest CDF and the broadest error distribution, with maximum errors exceeding 3 m. This shows high sensitivity to minor multipath effects and system noise, and a lack of effective mechanisms for bias compensation. CNN-LSTM converges in the low-to-medium error range, but its CDF curve is clearly right-shifted. At the 90th percentile, its error remains higher than the fusion model’s, suggesting less robustness to random ranging disturbances. BiLSTM shows a pronounced long-tail effect, with its CDF curve flattening notably beyond the medium-error region. This suggests that, without physical constraints, data-driven models may misinterpret temporal noise as valid state changes, leading to error accumulation. UKF, with its physical motion model constraints, remains robust in the moderate-error range but exhibits noticeable tailing at higher errors. This indicates the limited adaptability of filtering strategies that use fixed noise statistics under complex conditions. In contrast, UKF-BiLSTM converges rapidly in the low-error region, has the most concentrated error distribution, and strongly suppresses the long tail of errors. These results show that the bidirectional mutual calibration mechanism effectively reduces both random disturbances and systematic biases.

As shown in Figure 10b, the CDF curves of all algorithms in Env1 shift markedly toward lower error values and exhibit a substantially narrower distribution than in Env0. This indicates more stable signal propagation, with limited multipath effects and reduced obstruction-induced interference. The Chan–Taylor method shows a more concentrated error distribution, with a maximum error of about 1.4 m. Thus, under weak interference, geometric solutions can partially average out random measurement noise, though their accuracy remains fundamentally constrained by the LOS unbiased assumption. CNN-LSTM and BiLSTM exhibit similar CDF profiles across the low-to-medium error range, with a reduced long-tail effect, indicating that temporal features are more readily captured in consistent interference environments. Still, BiLSTM shows tailing in the medium-to-high error range, reflecting limited adaptability to noise variations. The UKF curve remains stable in the mid-error range. Still, it shows a slight, persistent tailing beyond about 0.4 m, indicating that fixed-Q/R noise modeling cannot fully capture subtle fluctuations in ranging errors. In contrast, UKF-BiLSTM maintains the steepest ascent across the error range, with the lowest 90th-percentile error, indicating effective suppression of residual uncertainty.

As shown in Figure 10c, the Chan–Taylor method exhibits near-complete performance degradation in Env2. Its CDF curve increases very slowly in the low-cumulative-probability region. It shows a pronounced long tail in the high-error range, arising from the strict dependence of geometric localization on an unbiased LOS assumption. However, systematic NLOS biases induced by strong reflections in industrial environments cannot be effectively mitigated by geometric averaging, leading to estimated positions that exhibit a consistent offset from the ground truth. In comparison, CNN-LSTM performs comparably to the fusion model in the low-error range (approximately 0–0.3 m), indicating its ability to capture local feature patterns under relatively stable reflection structures. However, pronounced tailing remains in the high-error range, suggesting the limited generalization capability of purely data-driven models under complex noise conditions. Both UKF and BiLSTM also exhibit noticeable error tailing in this environment. The UKF relies on fixed models for process and observation noise covariance. This inherent limitation hinders its ability to maintain long-term statistical consistency in environments with strong reflections and rapidly varying noise, occasionally leading to suboptimal estimates. As a purely temporal model, BiLSTM tends to misinterpret anomalous ranging errors as patterns of state evolution, leading to progressive error accumulation. In contrast, UKF-BiLSTM enables online calibration during filtering, reducing residual errors under interference.

As shown in Figure 10d, the Chan–Taylor method improves in Env3, as indicated by a leftward shift in its CDF curve relative to Env2. This demonstrates that random multipath fluctuations are more amenable to partial suppression through geometric averaging than systematic NLOS biases. However, the error distribution remains dispersed with a pronounced high-error tail, reflecting the limits of geometric models in complex, penetration-dominated environments. CNN-LSTM maintains stable performance in this context, highlighting that convolutional structures are robust to material changes when extracting local spatial features. Nevertheless, they still cannot suppress anomalous ranging errors caused by random penetration effects in the high-error region. BiLSTM’s performance drops significantly in Env3, with the most pronounced error tailing, primarily due to multipath interference from gypsum partitions that exhibit strong, random temporal correlations. BiLSTM absorbs short-term anomalous-range errors into state predictions, amplifying long-term deviations. Meanwhile, the UKF-BiLSTM fusion model sustains a steep CDF curve with concentrated errors, demonstrating effective error suppression across disturbance types.

CDF analysis across four environments shows that the UKF-BiLSTM fusion framework has the most concentrated error distribution and the lowest long-tail probability. This confirms its effectiveness in improving positioning accuracy and highlights its strong ability to suppress large-error events. Thus, the framework achieves a favorable trade-off between high precision and strong robustness under complex and dynamic indoor propagation conditions.

Figure 11 illustrates the trajectory estimation results of the UKF-BiLSTM fusion model for simulated pedestrian motion with a constant step length and velocity across four indoor environments. Heatmaps of the GDOP are overlaid to reveal the relationship between environmental geometry and trajectory tracking performance.

In Figure 11a, the estimated trajectory in Env0 closely matches the ground truth, indicating accurate and stable tracking. A slight trajectory bending is observed near anchor A7 (X ≈ 2–3, Y ≈ 8–9), where the GDOP heatmap shows low values (blue), reflecting a favorable geometric configuration. According to the room layout [30], anchor A7 is at a wall corner, and narrow passageways in this area are prone to transient NLOS errors. Nevertheless, no significant trajectory deviation is observed, demonstrating strong robustness against localized interference.

As shown in Figure 11b, Env1 features a more compact layout and denser anchor deployment than Env0. The proposed UKF-BiLSTM model maintains accurate trajectory tracking along the entire path, with only a slight lag observed in the terminal segment (from A6 to A7). This lag occurs in a region with elevated GDOP and increased observation uncertainty, where the trajectory updates become more conservative. As a result, the estimated path remains smooth and stable without noticeable deviation, indicating robust tracking performance under locally degraded measurement conditions.

As shown in Figure 11c for Env2, the estimated trajectory closely follows the ground-truth path throughout the entire motion sequence. Near the trajectory endpoint, where GDOP increases and the geometric configuration deteriorates, the model still exhibits no noticeable divergence or drift. This indicates that stable trajectory tracking is maintained even under limited anchor availability and degraded geometry.

As shown in Figure 11d, the estimated trajectory in Env3 contains a pronounced long-range return segment, indicating consistent state extrapolation during path reversals. The absence of trajectory straightening or contraction suggests that the model avoids ambiguity typically induced by loops or reversals. From a GDOP perspective, the main motion segment corresponds to low GDOP values (blue), indicating favorable geometry. Although elevated GDOP values (orange) are observed near anchor A2, no noticeable trajectory deviation occurs, demonstrating strong robustness under degraded geometric conditions.

Experimental results across four representative indoor environments indicate that the proposed UKF-BiLSTM framework consistently achieves high-precision, robust trajectory tracking. This performance is primarily attributable to the synergistic effect of two key design components. First, the CTRV model imposes five-dimensional state constraints on linear velocity, heading angle, and heading angular velocity, preventing non-physical abrupt transitions. Second, the bidirectional mutual calibration mechanism adaptively balances observation information and state propagation under NLOS conditions, degraded GDOP, or limited anchor redundancy. As a result, localized observation anomalies are effectively suppressed while stable and smooth trajectory convergence is maintained.

Figure 12 and Table 11 show how localization accuracy changes with the number of anchors in four experimental environments. In general, increasing the number of anchors tends to improve accuracy, but the effect is not consistent for all environments. Environments 1, 2, and 3 reach their lowest RMSE with 4, 6, and 6 anchors, respectively. Adding more anchors beyond these points increases the RMSE. In contrast, Environment 0 achieves its lowest RMSE only when all eight anchors are used. This suggests that, in complex indoor scenarios, adding more anchors does not always improve localization accuracy. Anchors added beyond an environment-specific optimal number are often placed in positions with poor geometry or weak signal propagation. The resulting heterogeneous measurement errors can introduce conflicting constraints in the UKF-BiLSTM fusion, reducing localization accuracy. Therefore, the maximum achievable accuracy is determined not by the total number of anchors, but by an “optimal anchor subset” with consistent measurement quality and favorable geometry. In practice, the strategy should focus on selecting a high-quality subset of anchors rather than simply increasing the total number. It should be noted that the RMSE values reported in this section are obtained under different anchor availability conditions and are used to analyze the effect of anchor number on localization performance.

5.3.4. Comparative Analysis with State-of-the-Art Algorithms

To comprehensively assess the performance of the proposed UKF-BiLSTM bidirectional fusion method, this section compares it with several state-of-the-art (SOTA) algorithms: F-BERT [21], STA-GNN-M [23], CNN-LSTM-DEKF [10], and AR-PNN [33]. All comparison experiments use the full set of anchors, with all available ranging measurements applied for model training and testing. This setup differs from the subset experiments in the previous section, which used only selected anchors to examine how anchor availability affects localization accuracy. The comparison results are summarized in Table 12.

F-BERT reported RMSE values of 0.1029–0.2652 m in controlled static laboratory scenarios (STA-1 to STA-4). This study validated the method on a public dataset, achieving RMSE values of 0.1024–0.2106 m across four heterogeneous environments. Under stricter testing conditions, the method achieved localization accuracy comparable to F-BERT in most environments (Env0, Env1, and Env3). In the industrial environment with the strongest NLOS interference (Env2, RMSE = 0.2106 m), the model maintained stable and usable localization performance.

For the comparison with STA-GNN-M, four NLOS test sequences (N-4, N-9, N-1, N-7) were selected, covering a range of NLOS conditions. In Env1 and Env0, which have lighter NLOS conditions, our method achieved RMSEs of 0.1258 m and 0.1389 m, lower than the RMSEs of STA-GNN-M on sequences of similar difficulty (N-4: 0.140 m, N-9: 0.150 m). In Env2, which has stronger NLOS conditions, our method achieved an RMSE of 0.2106 m, comparable to STA-GNN-M on the high-interference sequence N-1 (0.224 m). The model has approximately 81 k–142 k parameters, lower than the 266 k parameters of STA-GNN-M, reducing model size and computational cost for resource-constrained applications.

CNN-LSTM-DEKF, which also uses a DL plus filtering framework, achieved an RMSE of 0.205 m and a mean absolute error (MAE) of 0.192 m in a private laboratory environment. In Env3, which has similar conditions, our method achieved an RMSE of 0.1024 m and an MAE of 0.0761 m, about 50% lower than CNN-LSTM-DEKF. Across the four environments with varying NLOS conditions, our method maintained RMSEs between 0.1158 m and 0.2106 m.

In Env0, AR-PNN achieved an RMSE of 0.14 m. Its CDF shows the probability of errors ≤ 0.5 m and ≤1.0 m as 80% and 95%, respectively. Our method achieved an RMSE of 0.1389 m. Its CDF shows errors ≤ 0.5 m at 95.29% and ≤1.0 m at 100%, with a 90th percentile error of 0.2475 m (Figure 10a). In contrast, AR-PNN was tested in only one environment, while our method was evaluated in four environments.

In summary, UKF-BiLSTM achieves high accuracy, low complexity, and strong generalization across multiple public datasets. Compared with existing DL and traditional optimization methods, it offers overall advantages. The method is suitable for complex NLOS scenarios.

6. Conclusions

To improve UWB indoor positioning systems, two complementary strategies are proposed: NLOS identification and error correction. The ISO-MBP model extends the BP classifier with an MHSA mechanism to enhance feature modeling and an optimized serpentine search algorithm for robust global optimization. The UKF-BiLSTM integration framework combines motion-model constraints with temporal sequence analysis to enable accurate localization under NLOS conditions. Experimental results show that ISO-MBP achieves high NLOS identification accuracy, outperforming internal baseline models. UKF-BiLSTM maintains consistently low RMSE across four environments with diverse NLOS ratios. Its CDFs show sharp trends and concentrated error distributions, indicating effective suppression of both random disturbances and systematic biases. Compared with recent SOTA methods, UKF-BiLSTM achieves comparable localization accuracy while requiring lower model complexity and inference time.

The ISO-MBP model improves LOS/NLOS classification, but its complex architecture increases training costs. The SO algorithm’s performance is also sensitive to parameters. While this study uses effective parameters, better results may be achieved through deeper hyperparameter exploration. The UKF-BiLSTM model, trained on static NLOS scenarios, does not capture dynamic LOS/NLOS transitions caused by human movement in real deployments. This gap may reduce trajectory stability during sudden obstructions.

Future research will advance along several interconnected directions. First, we will explore more lightweight self-attention architectures to further improve model efficiency. Second, we will expand the experimental dataset to include dynamic LOS/NLOS transitions, allowing a more comprehensive evaluation of the UKF–BiLSTM framework. Third, we will extend the framework to handle complex channel conditions, particularly those involving the superposition of direct and multipath signals. This will require the development of refined labeling strategies and feature fusion mechanisms to better characterize signal propagation behaviors.

Author Contributions

Y.W.: Writing—review and editing, Writing—original draft, Validation, Methodology, Investigation. Z.D.: Supervision, Funding acquisition, Conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 72071183); Key Research and Development projects in Shanxi Province (No. 202202100401002); the University Technology Achievement Digitalization and Transformation Platform Development and Application Project (YDZJSX2025A001); and the Technical Research and Application Demonstration of an Intelligent Perception and Emergency Management System for Coal Silo Environments (202402100101007).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, X.; Zhang, C.; Peng, A. Enhanced propagation model constrained RSS fingerprints patching with map assistance for Wi-Fi positioning. Comput. Commun. 2023, 208, 200–209. [Google Scholar] [CrossRef]
Galván-Tejada, C.E.; Carrasco-Jiménez, J.C.; Brena, R.F. Bluetooth-WiFi based combined positioning algorithm, implementation and experimental evaluation. Procedia Technol. 2013, 7, 37–45. [Google Scholar] [CrossRef]
Wang, L.; Nie, B.; Zhang, R.; Zhai, S.; Li, H. ZigBee-based positioning system for coal miners. Procedia Eng. 2011, 26, 2406–2414. [Google Scholar] [CrossRef]
Marano, S.; Gifford, W.M.; Wymeersch, H.; Win, M.Z. NLOS identification and mitigation for localization based on UWB experimental data. IEEE J. Sel. Areas Commun. 2010, 28, 1026–1035. [Google Scholar] [CrossRef]
Barbieri, L.; Brambilla, M.; Trabattoni, A.; Mervic, S.; Nicoli, M. UWB localization in a smart factory: Augmentation methods and experimental assessment. IEEE Trans. Instrum. Meas. 2021, 70, 1–18. [Google Scholar] [CrossRef]
Díez-González, J.; Ferrero-Guillén, R.; Verde, P.; Martínez-Gutiérrez, A.; Álvarez, R.; Torres-Sospedra, J. Time-based UWB localization architectures analysis for UAVs positioning in industry. Ad Hoc Netw. 2024, 157, 103419. [Google Scholar] [CrossRef]
Yu, K.; Wen, K.; Li, Y.; Zhang, S.; Zhang, K. A novel NLOS mitigation algorithm for UWB localization in harsh indoor environments. IEEE Trans. Veh. Technol. 2018, 68, 686–699. [Google Scholar] [CrossRef]
Tian, Y.; Lian, Z.; Wang, P.; Wang, M.; Yue, Z.; Chai, H. Application of a long short-term memory neural network algorithm fused with Kalman filter in UWB indoor positioning. Sci. Rep. 2024, 14, 1925. [Google Scholar] [CrossRef]
Eang, C.; Lee, S. An integration of deep neural network-based extended Kalman filter (DNN-EKF) method in ultra-wideband (UWB) localization for distance loss optimization. Sensors 2024, 24, 7643. [Google Scholar] [CrossRef]
Zhou, Z.; Xu, Z.; Xia, J. Deep learning optimization positioning algorithm based on UWB/IMU fusion in complex indoor environments. Phys. Commun. 2025, 71, 102702. [Google Scholar] [CrossRef]
Zhang, S.; Wang, E.; Zhu, Z.; Yi, J.; Wang, Y.; Kuai, E. UKF-FNN-RIC: A highly accurate UWB localization algorithm for TOA scenario. IEEE Trans. Instrum. Meas. 2024, 73, 8508013. [Google Scholar] [CrossRef]
Zhang, C.; Bao, X.; Wei, Q.; Ma, Q.; Yang, Y.; Wang, Q. A Kalman filter for UWB positioning in LOS/NLOS scenarios. In Proceedings of the 2016 International Conference on Ubiquitous Positioning, Indoor Navigation and Location-Based Services (UPINLBS), Shanghai, China, 2–4 November 2016. [Google Scholar]
Guo, Y.; Li, W.; Yang, G.; Jiao, Z.; Yan, J. Combining dilution of precision and Kalman filtering for UWB positioning in a narrow space. Remote Sens. 2022, 14, 5409. [Google Scholar]
Xu, Y.; Shmaliy, Y.S.; Ahn, C.K.; Tian, G.; Chen, X. Robust and accurate UWB-based indoor robot localisation using integrated EKF/EFIR filtering. IET Radar Sonar Navig. 2018, 12, 750–756. [Google Scholar] [CrossRef]
Lyu, Y.; Wei, M.; Li, S.; Wang, D. A fusion positioning system with environmental-adaptive algorithm: IPSO-IAUKF fusion of UWB and IMU for NLOS noise mitigation. Meas. Sens. 2025, 38, 101864. [Google Scholar] [CrossRef]
Feng, D.; Wang, C.; He, C.; Zhuang, Y.; Xia, X.-G. Kalman-filter-based integration of IMU and UWB for high-accuracy indoor positioning and navigation. IEEE Internet Things J. 2020, 7, 3133–3146. [Google Scholar] [CrossRef]
Wang, Y.; Li, X. The IMU/UWB fusion positioning algorithm based on a particle filter. ISPRS Int. J. Geo-Inf. 2017, 6, 235. [Google Scholar] [CrossRef]
Han, Y.; Wei, C.; Li, R.; Wang, J.; Yu, H. A novel cooperative localization method based on IMU and UWB. Sensors 2020, 20, 467. [Google Scholar] [CrossRef]
Poulose, A.; Han, D.S. UWB indoor localization using deep learning LSTM networks. Appl. Sci. 2020, 10, 6290. [Google Scholar] [CrossRef]
Nguyen, D.T.A.; Lee, H.-G.; Jeong, E.-R.; Lee, H.L.; Joung, J. Deep learning-based localization for UWB systems. Electronics 2020, 9, 1712. [Google Scholar] [CrossRef]
He, S.; Yang, B.; Liu, T.; Zhang, H. Multi-Tag UWB Localization with Spatial-Temporal Attention Graph Neural Network. IEEE Trans. Instrum. Meas. 2024, 73, 2531112. [Google Scholar] [CrossRef]
Tang, K.; Yang, B.; Ding, K. Deep attention-based network combing geometric information for UWB localization in complex indoor environments. IEEE Access 2024, 12, 31488–31497. [Google Scholar] [CrossRef]
Yang, H.; Wang, Y.; Seow, C.K.; Sun, M.; Coene, S.; Huang, L.; Joseph, W.; Plets, D. Fuzzy Transformer Machine Learning for UWB NLOS Identification and Ranging Mitigation. IEEE Trans. Instrum. Meas. 2025, 74, 8503817. [Google Scholar] [CrossRef]
Wang, E.; Wang, Y.; Zhang, S.; Xu, S.; Chen, Y.; Wang, Y.; Yu, T.; Lei, H. High-Precision UWB TDOA Localization Algorithm Based on UKF-FNN-CHAN-RIC. IEEE Trans. Instrum. Meas. 2025, 74, 8506813. [Google Scholar] [CrossRef]
Ren, M.; Wei, J.; Qin, J.; Guo, X.; Wang, H.; Li, S. Attention based LSTM framework for robust UWB and INS integration in NLOS environments. Sci. Rep. 2025, 15, 21637. [Google Scholar] [CrossRef] [PubMed]
Muthineni, K.; Artemenko, A.; Abode, D.; Vidal, J.; Nájar, M. PosGNN: A Graph Neural Network Based Multimodal Data Fusion for Indoor Positioning in Industrial Non-Line-of-Sight Scenarios. IEEE Open J. Veh. Technol. 2025, 7, 15–26. [Google Scholar] [CrossRef]
Decawave. DW1000 User Manual; DecaWave Limited: Dublin, Ireland, 2017. [Google Scholar]
Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
Bregar, K.; Hrovat, A.; Mohorcic, M. Nlos channel detection with multilayer perceptron in low-rate personal area networks for indoor localization accuracy improvement. In Proceedings of the 8th Jožef Stefan International Postgraduate School Students’ Conference, Ljubljana, Slovenia, 31 May 2016; Jožef Stefan International Postgraduate: Ljubljana, Slovenia, 2016; pp. 1–8. [Google Scholar]
Bregar, K. Indoor UWB positioning and position tracking data set. Sci. Data 2023, 10, 744. [Google Scholar] [CrossRef]
Pei, Y.; Chen, R.; Li, D.; Xiao, X.; Zheng, X. FCN-Attention: A deep learning UWB NLOS/LOS classification algorithm using fully convolutional neural network with self-attention mechanism. Geo-Spat. Inf. Sci. 2024, 27, 1162–1181. [Google Scholar] [CrossRef]
Ma, Z.; Deng, Z.; Tian, Z.; Zhang, Y.; Wang, J.; Guo, J. A line-of-sight/non-line-of-sight recognition method based on the dynamic multi-level optimization of comprehensive features. Sensors 2025, 25, 304. [Google Scholar] [CrossRef]
Liu, Y.; Hu, E.; Chen, Y.; Guo, C. Neurodynamic robust adaptive UWB localization algorithm with NLOS mitigation. Sci. Rep. 2025, 15, 14271. [Google Scholar] [CrossRef]

Figure 1. CIR waveforms under LOS and NLOS conditions.

Figure 2. Distributions of FP_AMP1–FP_AMP3 under LOS and NLOS conditions. (a) FP_AMP1. (b) FP_AMP2. (c) FP_AMP3.

Figure 3. Noise statistics under LOS and NLOS conditions. (a) Maximum noise amplitude. (b) Noise standard deviation.

Figure 4. Distribution of CIR_PWR under LOS and NLOS.

Figure 5. ISO-MBP model flowchart.

Figure 6. Architecture of the BiLSTM.

Figure 7. UKF-BiLSTM model flowchart.

Figure 8. Confusion matrix plots of six models. (a) BP. (b) MBP. (c) ISO-BP. (d) RF. (e) IPSO-MBP. (f) ISO-MBP.

Figure 9. Comparison of anchor-level RMSE under four environments. (a) Env 0. (b) Env1. (c) Env 2. (d) Env 3.

Figure 10. Comparison of positioning error CDFs under four environments. (a) Env 0. (b) Env 1. (c) Env 2. (d) Env 3.

Figure 11. Trajectories of UKF-BiLSTM under four environments. (a) Env 0. (b) Env 1. (c) Env 2. (d) Env 3.

Figure 12. Anchor-level RMSE of UKF-BiLSTM under varying numbers of anchors. (a) Env 0. (b) Env 1. (c) Env 2. (d) Env 3.

Table 1. Overview of the NLOS error mitigation dataset.

Env	Type	NLOS Conditions	Preprocessing
0	Large residential apartment 9.18 × 12.06 m brick exterior + plasterboard interior	Few NLOS, only 5 outliers.	Basic deviation compensation
1	Compact residential apartment 3.60 × 6.69 m concrete exterior + plaster interior	Minimal NLOS, no abnormal values.	Basic deviation compensation
2	Industrial workshop 21.96 × 11.85 m dense metal equipment	High NLOS, strong multipath.	Basic deviation compensation DBSCAN denoising (metal reflection outliers) antenna delay compensation
3	Office 15.37 × 11.50 m concrete exterior + plasterboard partitions	Significant NLOS, 22 outliers.	Basic deviation compensation DBSCAN denoising (partition NLOS outliers) antenna delay compensation

Table 2. Configuration of baseline models BP and MBP.

Parameter Category	Parameter	BP Model	MBP Model
Network architecture	Input layer	10	10
	Hidden layer 1	150	150
	Hidden layer 2	75	75
	Output layer	2	2
Training configuration	Attention mechanism	None	MHSA (6 heads)
	Optimizer	Adam	Adam
	Learning rate	0.001	0.001
	Training epochs	200	200

Table 3. Parameter settings of the hybrid optimization algorithms.

Parameter Category	Parameter	IPSO-MBP Model	ISO-MBP Model
Basic configuration	Population size ( $N$ )	10	10
	Maximum iterations ( $T$ )	150	150
	Parameter search range	$[- 0.5, 1.5]$	$[- 0.5, 1.5]$
Hybrid strategy	ICMIC chaotic map	$a = 0.9$	$a = 0.9$
	DE scaling factor	$μ = 0.1$	$μ = 0.1$
	DE crossover probability	$C R = 0.8$	$C R = 0.8$
	Specific parameters	$w = 0.7, c_{1} = c_{2} = 1.5$	$C_{1} = 0.5, C_{2} = 0.05, C_{3} = 2.0$

Table 4. Comparison of the performance of different algorithms.

Model	Accuracy	Precision	Recall	F1-Score	Model Parameters (K)	Training Time (s)	Inference Latency (ms)
BP	86.12%	87.62%	84.66%	86.13%	13.1	463	0.04
MBP	88.75%	88.87%	89.02%	88.94%	103.7	669	0.16
ISO-BP	89.03%	90.85%	87.21%	88.91%	13.1	411	0.07
RF	87.05%	86.10%	88.89%	87.47%	1.3	55	24.41
IPSO-MBP	89.88%	90.30%	89.74%	90.02%	103.7	1576	0.17
FCN-Attention [31]	88.24%	85.85%	91.56%	88.62%	1097.6	-	0.189
DMOCF [32]	93.50%	93.52%	93.70%	93.61%	15,090	-	-
ISO-MBP	91.62%	92.46%	90.92%	91.67%	103.7	1282	0.16

Table 5. Unified model configuration parameters.

Item		BiLSTM	CNN-LSTM	UKF-BiLSTM
Input features	UWB ranging	$N$ -dim	$N$ -dim	$N$ -dim
	UKF state	5-dim	5-dim	5-dim
	Historical position	2-dim	2-dim	2-dim
Total feature dimension		$N$ + 7	$N$ + 7	$N$ + 7
Training configurations	Learning rate	0.001	0.001	0.001
	Optimizer	Adam	Adam	Adam
	Training epochs	100	100	100
	Batch size	32	32	32
	Sequence length	10	10	10
	Hidden size	64	128	64
	Core architecture	Two-layer BiLSTM	CNN + three-layer LSTM	UKF–BiLSTM bidirectional mutual calibration

Table 6. Core parameters of BiLSTM and UKF.

Model	Parameter Category	Parameter Name	Env0	Env1	Env2	Env3
BiLSTM	Network structure	Number of layers	2	2	2	2
		Hidden layer size	64	64	64	64
		Sequence length	10	10	10	10
	Training config	Learning rate	0.001	0.001	0.001	0.001
		Batch size	32	32	32	32
		Training epochs	100	100	300	100
UKF	Initial state	Initial velocity	0.4 m/s	0.3 m/s	0.8 m/s	0.2 m/s
	Process noise	Velocity noise	0.15	0.1	0.3	0.08
	Process noise	Angular velocity noise	0.08	0.05	0.12	0.03
	Measurement noise	Measurement noise	0.25	0.15	1.4	0.3
	UKF params	Sigma params ( $α, β, γ$ )	(0.1, 2, −2)	(0.1, 2, −2)	(0.01, 2, 0)	(0.1, 2, −2)
	Dynamic calibration	Q/R rate	0.5/0.3	0.4/0.2	0.6/0.3	0.6/0.4

Table 7. Anchor-level RMSE in Env 0.

Anchor	UKF/m	BiLSTM/m	Chan-Taylor/m	CNN-LSTM/m	UKF-BiLSTM/m
A1	0.2228	0.0712	0.2653	0.5054	0.1108
A2	0.2423	0.2809	0.4636	0.7742	0.1368
A3	0.2292	0.3605	0.5717	0.6759	0.1505
A4	0.2318	0.4049	0.6099	0.6554	0.1681
A5	0.2151	0.4202	0.6192	0.8231	0.1774
A6	0.2088	0.3454	0.6387	0.4710	0.1572
A7	0.1896	0.2163	0.6537	0.5826	0.1435
A8	0.2228	0.2496	0.3718	0.3575	0.1308

Table 8. Anchor-level RMSE in Env 1.

Anchor	UKF/m	BiLSTM/m	Chan-Taylor/m	CNN-LSTM/m	UKF-BiLSTM/m
A1	0.1572	0.1999	0.3060	0.1700	0.0798
A2	0.1686	0.2074	0.3190	0.1263	0.1110
A3	0.1378	0.1173	0.2958	0.1452	0.0768
A4	0.1458	0.1214	0.2769	0.1479	0.0675
A5	0.1759	0.1688	0.2051	0.1617	0.0640
A6	0.1449	0.1542	0.2677	0.1830	0.0789
A7	0.1330	0.1528	0.2461	0.1838	0.0675
A8	0.1645	0.1950	0.2326	0.1678	0.0762

Table 9. Anchor-level RMSE in Env 2.

Anchor	UKF/m	BiLSTM/m	Chan-Taylor/m	CNN-LSTM/m	UKF-BiLSTM/m
A1	0.1720	0.2483	0.9699	0.2879	0.1573
A2	0.1943	0.2614	1.0433	0.3735	0.1874
A3	0.2394	0.2415	0.7547	0.2213	0.1906
A4	0.2359	0.2426	0.7684	0.2065	0.2032
A5	0.3163	0.2430	0.9395	0.2608	0.1933
A6	None	None	None	None	None
A7	0.4442	0.2172	0.8993	0.2469	0.1910
A8	0.4150	0.2113	1.1244	0.2792	0.1849

Table 10. Anchor-level RMSE in Env 3.

Anchor	UKF/m	BiLSTM/m	Chan-Taylor/m	CNN-LSTM/m	UKF-BiLSTM/m
A1	0.3942	0.5320	0.3629	0.2460	0.0795
A2	0.4419	1.0746	0.4576	0.2825	0.0964
A3	0.5085	1.4742	0.4954	0.2906	0.0924
A4	0.4570	1.1876	0.4439	0.2007	0.1123
A5	0.3361	0.6734	0.3446	0.1765	0.1070
A6	0.4036	0.5870	0.4151	0.2202	0.0775
A7	0.3378	1.4528	0.3445	0.2053	0.1068
A8	0.3559	0.6817	0.3631	0.2010	0.1153

Table 11. Comparative RMSE under varying anchor numbers in four environments.

Environment	3 Anchors	4 Anchors	6 Anchors	7 Anchors	8 Anchors
0	1.2843	1.2756	0.2293	-	0.1800
1	0.2195	0.1729	0.1887	-	0.2129
2	1.1390	0.6590	0.4681	0.6249	-
3	2.1912	1.2607	0.3775	-	0.4219

Table 12. Comparison with recent SOTA UWB indoor positioning methods.

Method	Dataset	Scenario/Subset	RMSE (m)	MAE (m)	Parameters (K)
F-BERT [21]	Private	STA-1	0.2537	-	-
		STA-2	0.2652
		STA-3	0.1337
		STA-4	0.1029
STA-GNN-M [23]	Private	N-4	0.140	-	266.5
		N-9	0.150
		N-1	0.224
		N-7	0.149
CNN-LSTM-DEKF [10]	Private	Laboratory	0.205	0.192	-
AR-PNN [33]	Public	Env0	0.14	-	-
Ours	Public	Env0	0.1389	0.1053	82.2
		Env1	0.1158	0.0891	141.1
		Env2	0.2106	0.1756	142.7
		Env3	0.1024	0.0761	81.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Dong, Z. UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction. Electronics 2026, 15, 687. https://doi.org/10.3390/electronics15030687

AMA Style

Wang Y, Dong Z. UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction. Electronics. 2026; 15(3):687. https://doi.org/10.3390/electronics15030687

Chicago/Turabian Style

Wang, Yiwei, and Zengshou Dong. 2026. "UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction" Electronics 15, no. 3: 687. https://doi.org/10.3390/electronics15030687

APA Style

Wang, Y., & Dong, Z. (2026). UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction. Electronics, 15(3), 687. https://doi.org/10.3390/electronics15030687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

UWB Positioning in Complex Indoor Environments Based on UKF–BiLSTM Bidirectional Mutual Correction

Abstract

1. Introduction

2. Related Work

2.1. Physics-Based Methods

2.2. Deep Learning-Based NLOS Mitigation

2.3. Hybrid Model-Based NLOS Mitigation

3. NLOS Identification Model

3.1. Analysis of Parameters Related to NLOS Recognition

3.2. Snake Optimizer

3.3. ISO-MBP Model

4. NLOS Error Mitigation Model

4.1. CTRV Model

4.2. UKF

4.3. BiLSTM

4.4. UKF-BiLSTM Bidirectional Mutual Correction Model

5. Experimental Analysis

5.1. Dataset Description

5.2. NLOS Identification Experiments

5.2.1. Configuration of Baseline Neural Network Models

5.2.2. Parameter Settings of Intelligent Optimization Algorithms

5.2.3. NLOS Classification Results

5.3. NLOS Error Correction Experiments

5.3.1. Unified Configuration of Comparative Models

5.3.2. Core Model Parameter Settings

5.3.3. Baseline Performance Evaluation

5.3.4. Comparative Analysis with State-of-the-Art Algorithms

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI