Next Article in Journal
A Study on a Low-Cost IMU/Doppler Integrated Velocity Estimation Method Under Insufficient GNSS Observation Conditions
Previous Article in Journal
HMRM: A Hybrid Motion and Region-Fused Mamba Network for Micro-Expression Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection

1
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
School of Computer and Information, Dezhou University, Dezhou 253023, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(24), 7673; https://doi.org/10.3390/s25247673
Submission received: 11 November 2025 / Revised: 11 December 2025 / Accepted: 16 December 2025 / Published: 18 December 2025
(This article belongs to the Section Navigation and Positioning)

Abstract

Unmanned aerial vehicles (UAVs) are increasingly used indoors for inspection, security, and emergency tasks. Achieving accurate and robust localization under Global Navigation Satellite System (GNSS) unavailability and obstacle occlusions is therefore a critical challenge. Due to their inherent physical limitations, Inertial Measurement Unit (IMU)–based localization errors accumulate over time, Ultra-Wideband (UWB) measurements suffer from systematic biases in Non-Line-of-Sight (NLOS) environments and Visual–Inertial Odometry (VIO) depends heavily on environmental features, making it susceptible to long-term drift. We propose a tightly coupled fusion framework based on the Error-State Kalman Filter (ESKF). Using an IMU motion model for prediction, the method incorporates raw UWB ranges, VIO relative poses, and TFmini altitude in the update step. To suppress abnormal UWB measurements, a multi-epoch outlier rejection method constrained by VIO is developed, which can robustly eliminate NLOS range measurements and effectively mitigate the influence of outliers on observation updates. This framework improves both observation quality and fusion stability. We validate the proposed method on a real-world platform in an underground parking garage. Experimental results demonstrate that, in complex indoor environments, the proposed approach exhibits significant advantages over existing algorithms, achieving higher localization accuracy and robustness while effectively suppressing UWB NLOS errors as well as IMU and VIO drift.

1. Introduction

In recent years, UAV technology has achieved remarkable progress, and its applications have expanded from traditional military reconnaissance to various civilian domains such as logistics delivery, building inspection, security surveillance, and disaster rescue [1]. These emerging application scenarios impose increasingly stringent requirements on the autonomous navigation and localization capabilities of UAVs. The integrated navigation system composed of GNSS and IMU can achieve centimeter-level positioning accuracy in outdoor environments. In contrast to the vast outdoor environment, indoor spaces lack GNSS signals, preventing UAVs from relying on conventional GNSS-based localization [2]. Although motion capture systems can provide sub-millimeter-level positioning accuracy, their high cost and limited coverage make them unsuitable for practical UAV navigation tasks. Therefore, achieving high-precision and high-robustness autonomous localization for UAVs in complex indoor environments has become a key challenge in the current development of UAV technology [3].
To address the challenges of indoor UAV localization, researchers have proposed various technical solutions, including positioning methods based on Wi-Fi, Bluetooth, vision, IMU, and UWB sensors [4,5,6]. Each of these methods possesses both advantages and limitations, making it difficult to independently provide stable and reliable localization results in complex and dynamic indoor environments. Specifically, Wi-Fi and Bluetooth-based positioning systems have limited accuracy and are easily affected by multipath effects [7,8,9]. Wi-Fi localization approaches exploit physical-layer channel state information (CSI) to achieve high-resolution positioning by jointly estimating multipath parameters in the angle-delay domain. Typical methods include maximum likelihood joint AoA/ToA estimation, variants such as JADED-RIP (Joint Angle and Delay Estimator and Detector), and sparse-recovery schemes based on iterative variational Bayes, which reconstruct a sparse path-gain vector on a discretized angle-delay grid [10,11,12,13,14]. However, these Wi-Fi CSI-based techniques generally require carefully calibrated antenna arrays, accurate synchronization, and favorable propagation conditions, and their performance may deteriorate in cluttered indoor environments with rich multipath. Visual odometry estimates UAV poses by matching feature points between consecutive image frames captured by a camera. It performs well in texture-rich environments; however, its performance deteriorates significantly under severe illumination changes, motion blur, or large textureless areas, potentially leading to localization failure [15,16,17]. The IMU estimates UAV poses by integrating acceleration and angular velocity, offering high-frequency measurements and short-term accuracy, but its localization error accumulates over time, resulting in severe drift [18]. UWB measures distance by calculating the Time of Flight (TOF) of radio signals. Under ideal Line-of-Sight (LOS) conditions, it can provide centimeter-level ranging accuracy. However, its major drawback lies in its extreme sensitivity to NLOS propagation. When signals are obstructed by obstacles, the measured distance tends to be significantly overestimated, severely degrading localization accuracy and reliability [19,20,21].
In indoor settings where NLOS propagation and multipath induce abnormal UWB ranges, researchers have developed a variety of outlier rejection techniques. Statistical methods based on multi-epoch data are simple and effective, as they identify and remove outliers that deviate significantly from the normal distribution through statistical analysis of consecutive measurements. These methods exploit the temporal consistency of data, achieving robust outlier handling with relatively low computational cost. Zhang et al. [22] proposed a multi-epoch outlier rejection algorithm based on Random Sample Consensus (RANSAC) to preprocess UWB measurements, effectively mitigating the negative impact of UWB outliers on the localization accuracy of multi-sensor fusion systems. Fan et al. [23] introduced a sliding-window-based outlier detection method that compares the distance variations between UWB and IMU prediction windows against a predefined threshold to detect abnormal values. However, this approach is highly sensitive to the chosen threshold and requires manual tuning for different environments. Another effective strategy is to directly handle outliers during the filtering process. Li et al. [24] proposed a method based on Mahalanobis distance and chi-square testing, in which the variance of UWB measurements associated with outliers is enlarged to reduce their influence on the filter’s state update. However, when the proportion of outliers is high, the effectiveness of this method becomes limited. Additionally, another straightforward approach is to compare the Mahalanobis distance against a predefined threshold and directly discard abnormal measurements [25]. Beyond processing measurement data itself, some studies attempt to detect and reject outliers by analyzing the physical characteristics of the received signals. Wang et al. [26] exploited the fact that received signal power attenuation in NLOS paths is typically greater than in LOS paths, using the power difference as a signal gain indicator and comparing it with a threshold to identify and eliminate NLOS measurements. However, in complex signal environments, this method may lead to misclassification. Stahlke et al. [27] directly fed the UWB Channel Impulse Response (CIR) into a Convolutional Neural Network (CNN) to classify LOS/NLOS conditions, thereby improving classification reliability and localization robustness in complex indoor environments. Pei et al. [28] proposed FCN-Attention, which combines a Fully Convolutional Network with an attention mechanism. Wang et al. [29] used CIR waveform features as inputs to systematically compare traditional machine learning models such as SVM, MLP, KNN, and XGBoost. However, the performance of these methods typically depends on the diversity and coverage of training data. Cross-environment generalization and online deployment remain challenging.
Although UWB systems can achieve high-precision 3D localization, their relatively low measurement frequency and sensitivity to signal occlusion limit their ability to provide continuous positioning during high-dynamic UAV flights. In contrast, Inertial Navigation Systems (INS) have the advantages of independence from external infrastructure and high output frequency, making them widely employed in indoor navigation tasks [30]. However, since INS relies on integration operations, its errors accumulate rapidly over time. To overcome the inherent limitations of single-sensor systems, researchers commonly integrate UWB with INS to achieve robust, high-frequency, and long-term stable localization [31]. Typical fusion strategies include filtering-based and optimization-based methods. Among the filtering-based fusion algorithms, the Extended Kalman Filter (EKF) is one of the most widely used techniques. For example, Liu et al. [32] proposed a UWB/IMU data fusion method based on EKF, which adopts a traditional linear regression calibration model and a mean filter for distance smoothing, effectively improving the system’s localization accuracy and robustness. Similarly, Feng et al. [33] utilized EKF to fuse IMU and UWB data and introduced both constant-velocity and constant-acceleration motion models to achieve smoother position estimation, thereby further enhancing system robustness and localization accuracy. However, the EKF is essentially an extension of the standard Kalman Filter (KF), where the nonlinear observation equations are linearized via first-order Taylor expansion for state estimation. When the system exhibits strong nonlinearity, this approximation neglects higher-order terms, leading to a degradation in estimation accuracy. To address the limitations of EKF in highly nonlinear systems, You et al. [34] employed the Unscented Kalman Filter (UKF) for IMU and UWB fusion localization. The UKF uses the unscented transform to more accurately propagate the mean and covariance of the system states, avoiding the linearization errors present in EKF and thus achieving higher localization accuracy. Both the Cubature Kalman Filter (CKF) and the UKF are designed to address the accuracy limitations of EKF when dealing with strongly nonlinear systems. Their core principle lies in using a set of carefully selected sigma points to propagate the mean and covariance through nonlinear transformations. Ji et al. [30] implemented UWB/IMU fusion using CKF, incorporating an adaptive factor to adjust the measurement noise covariance matrix and introducing a fading factor to suppress filter divergence.However, for INS systems where attitude is represented on Lie groups, both UKF and CKF still face challenges in computational efficiency and stability when handling attitude manifold nonlinearity. Consequently, the ESKF has gained widespread attention due to its effective handling of attitude error. This framework linearizes only the small error states, significantly enhancing the robustness and accuracy of the filter. Marković et al. [35] proposed a multi-sensor fusion algorithm based on ESKF, which integrates measurements from IMU, UWB, camera, and LiDAR while considering sensor drift and calibration errors. Moreover, an arbitration mechanism was introduced to remove abnormal sensor measurements before the fusion stage.
In addition to filtering-based multi-sensor fusion algorithms, optimization-based methods have increasingly attracted attention. Zheng et al. [36] proposed a tightly coupled graph optimization method for fusing UWB and IMU data. This model effectively utilizes multiple UWB tags, providing a new direction for the practical application of UWB/IMU fusion techniques. Xu et al. [37] proposed a general graph optimization-based localization framework that introduces ranging constraints and trajectory smoothness constraints into the position graph. The framework models and estimates the robot’s trajectory within a sliding window and solves for the optimal poses using optimization algorithms. Kang et al. [38] addressed the accuracy limitations in UWB-assisted UAV localization by proposing a factor graph fusion method based on incremental smoothing. The study aims to overcome the limitation of existing approaches that treat individual UWB range measurements as weak constraints. The core idea is to integrate high-frequency single-point UWB ranging data, low-frequency multilateration results, and IMU measurements into a unified factor graph framework. This method significantly improves localization accuracy and system robustness, especially in scenarios involving large motion variations. Song et al. [39] proposed a tightly coupled UWB/INS navigation system based on factor graph optimization to enhance UAV localization accuracy and robustness in indoor environments. The key innovation of this approach lies in discarding the traditional loosely coupled or semi-tightly coupled frameworks. Instead, it directly fuses raw UWB ranging information and IMU preintegration data within a factor graph, jointly optimizing all state variables, including position, velocity, attitude, and IMU biases. This tightly coupled architecture fully leverages the complementary characteristics of UWB and INS, effectively suppressing interference caused by UWB multipath effects. Zheng et al. [40] proposed a tightly coupled factor graph optimization method integrating UWB and LiDAR to address UWB localization drift caused by NLOS effects in indoor environments. The method fuses multi-source observation data within the factor graph and introduces an NLOS detection and correction module to identify and compensate for invalid UWB range measurements. Although optimization-based methods offer clear advantages in modeling flexibility and fusion accuracy, they still face certain limitations in practical applications. First, the optimization process is sensitive to the initial state; in dynamic scenarios or with large initial errors, it tends to settle in local minima and may even fail to converge. Second, optimization-based methods typically incur high computational complexity, making it difficult to meet the real-time localization requirements of highly dynamic platforms such as UAVs.
To address the limitations of existing methods and the aforementioned challenges, this paper proposes a high-precision multi-sensor fusion localization approach for indoor UAVs. The proposed method adopts a tightly coupled architecture based on the ESKF, which fuses multi-source information (including IMU prior propagation, raw UWB ranging, VIO relative poses, and TFmini altitude observations) within a unified state space to achieve efficient joint state updates and enhance system robustness. The main contributions of this work are summarized as follows:
  • A tightly coupled fusion framework built upon the ESKF leverages IMU, VIO, UWB, and TFmini observations to achieve accurate and robust localization. This framework fully exploits the complementary advantages of different sensors to achieve high-accuracy and high-robustness localization performance.
  • A sliding-window UWB model referenced to short-term VIO poses employs a RANSAC-based multi-epoch consistency check to reject outliers. This method significantly improves the statistical consistency and reliability of UWB observations, mitigates the influence of outliers on measurement updates, and thereby enhances fusion stability and positioning accuracy.
  • Extensive field experiments were conducted on a UAV system platform to validate the proposed algorithm in an underground parking garage. The experimental results demonstrate that the proposed method significantly outperforms single-sensor localization approaches in both accuracy and robustness, providing a reliable technical foundation for autonomous indoor navigation of UAV.
The remainder of this paper is organized as follows: Section 2 details the proposed VIO-constrained multi-epoch outlier rejection method and the ESKF-based multi-sensor fusion algorithm. Section 3 presents and analyzes field experimental results to validate the effectiveness of the proposed approach. Section 4 concludes the paper.

2. Methods

This study builds a multi-source heterogeneous sensor fusion localization system on a UAV platform, integrating an IMU, a UWB tag, a stereo camera, a 3D LiDAR, and a TFmini. A single UWB tag receives wireless ranging signals from N UWB anchors deployed indoors, whose global positions are known (denoted as A i , i = 1 , 2 , , N , with 3D coordinates p A i w R 3 in the world frame). The world coordinate frame w is defined with respect to the UWB anchor layout. Specifically, the origin is located at the projection of anchor A 1 on the ground. The x-axis points towards the projection of anchor A 2 , the z-axis is vertically upward, and the y-axis is determined by the right-hand rule. The tightly coupled system efficiently fuses four heterogeneous data sources: IMU, UWB, VIO, and TFmini. The core of the proposed system comprises two components: a VIO-constrained multi-epoch outlier rejection algorithm, and an ESKF-based tightly coupled fusion localization algorithm. The VIO-constrained outlier rejection leverages VIO’s short-horizon reliable pose estimates to identify and suppress NLOS and other abnormal UWB ranges in real time, substantially improving the usability of UWB observations and the overall accuracy of the fused system. The ESKF-based tightly coupled fusion algorithm effectively integrates measurements of different rates and characteristics to achieve accurate and robust state estimation. To clearly illustrate the information flow and fusion mechanism of the entire system, the overall architecture of the proposed tightly coupled fusion localization system is shown in Figure 1.

2.1. VIO-Constrained Multi-Epoch Outlier Rejection Algorithm

RANSAC is a general-purpose robust parameter estimation method that effectively handles significant outliers that may exist in UWB observation data. As a resampling technique, RANSAC first generates candidate solutions using the minimum number of observations required for the estimation model, and then gradually expands and incorporates consistent data points that satisfy the model constraints. To detect and reject outliers, we calculate the residuals between the predicted geometric distances (from the VIO trajectory and UWB anchor locations) and the measured UWB distances. If a residual exceeds a predefined threshold, indicating inconsistency with the model, the corresponding measurement is flagged as an outlier. In this paper, an improved RANSAC-based algorithm is employed to robustly reject UWB ranging outliers. The core idea of the algorithm is to leverage the high-accuracy relative trajectory information provided by VIO over a short period to perform consistency checks on multi-epoch UWB ranging data, thereby accurately identifying and eliminating outliers. The design and implementation of the algorithm primarily involve two aspects: the construction of the outlier rejection model and the specific implementation process. Figure 2 illustrates the conceptual diagram of the proposed RANSAC outlier rejection algorithm model.

2.1.1. Outlier Rejection Algorithm Model

In a short period of time, VIO can provide a relative trajectory that is highly consistent with the true trajectory. Therefore, we can use this trajectory and all corresponding UWB ranging data to estimate the relative positions of the UWB anchors, and use this as a model to identify outliers. In this paper, we employ a nonlinear optimization method to solve the above model and determine the positions of the anchors. For the i-th UWB anchor, assuming there are Q ranging data points within the sliding window, we construct an objective function to minimize the sum of squared measurement residuals:
F ( p A i w ) = min p A i w j = 1 Q p b j w + R b j w p t a g b p A i w 2 d i , j 2
Here, p A i w denotes the model-estimated position of the i-th anchor in the world frame. p b j w and R b j w are, respectively, the position and orientation of the UAV in the world frame at the j-th UWB ranging epoch, obtained from VIO after coordinate transformation. Accordingly, the positions and orientations corresponding to all ranging data within the sliding window are represented as T = { ( R b 1 w , p b 1 w ) , ( R b 2 w , p b 2 w ) , , ( R b Q w , p b Q w ) } . p tag b is the pre-calibrated extrinsic of the UWB tag with respect to the UAV. d i , j is the range measurement from the UWB tag to the i-th anchor at the j-th epoch; correspondingly, all range measurements of the i-th anchor within the sliding window are denoted as D i = { d i , 1 , d i , 2 , , d i , Q } .
This paper employs the Levenberg–Marquardt (LM) algorithm to solve this nonlinear least-squares problem. To achieve fast convergence, we compute the Jacobian of the objective function with respect to the anchor position. The residual r i , j is defined as:
r i , j = p b j w + R b j w p t a g b p A i w 2 d i , j
The Jacobian matrix J j is:
J j = r i , j p A i w = ( p A i w ( p b j w + R b j w p t a g b ) ) T p b j w + R b j w p t a g b p A i w 2
The Levenberg–Marquardt (LM) algorithm is a nonlinear least-squares solver that combines the Gauss–Newton method and gradient descent. It iteratively approaches the optimal solution. The solving procedure is as follows:
  • Initialization: Provide an initial estimate of the anchor position p A i w , 0 .
  • Iterative update: At the k-th iteration, we seek an increment Δ p A i w that minimizes the sum of squared residuals. The LM algorithm approximates the nonlinear least-squares problem by a linear system and introduces a damping factor λ to control the step size:
    ( J T J + λ I ) Δ p A i w = J T r
    where J is the Jacobian evaluated at the current estimate p A i w , k , with row vectors r i , j p A i w , k . r is the vector collecting all Q ranging residuals. I is the identity matrix. λ is the damping factor that balances between gradient descent and Gauss–Newton updates.
  • Update and convergence check: Solve the linear system above to obtain Δ p A i w , then update the anchor position estimate:
    p A i w , k + 1 = p A i w , k + Δ p A i w
    If the new estimate decreases the objective function F, accept the update and decrease the damping factor λ ; otherwise, reject the update and increase λ . This process repeats until a predefined convergence criterion is met (the norm of Δ p A i w falls below a threshold, or the maximum number of iterations is reached). Finally, we obtain an optimal estimate of the anchor position.

2.1.2. Outlier Rejection Procedure

To effectively remove outliers in UWB ranging data, we propose a VIO-constrained multi-epoch outlier rejection algorithm in which RANSAC serves as a key component. The entire algorithm is carried out by combining the relative poses provided by the UAV’s VIO with the UWB ranging data. First, we set the maximum number of RANSAC iterations N i t e r , the minimum sample size P for model construction, the residual threshold ϵ , and the inlier count threshold L. In each iteration, we randomly select P ranging samples from the UWB sliding window and, together with the VIO trajectory, estimate an initial UWB anchor position using the LM algorithm. This yields the current model. Next, we evaluate all remaining ranging measurements against this model by computing their residuals and the residual sum. A measurement is marked as an inlier if its residual is below the preset threshold ϵ . After multiple iterations, we select the model with the smallest residual sum and regard the associated set of ranging data as valid, since these measurements are most likely to originate from LOS propagation. Finally, all measurements classified as outliers are discarded, and only the valid data are used in the subsequent ESKF update. This procedure effectively increases the reliability of UWB observations and underpins the robustness and accuracy improvements of the proposed fusion framework. The pseudocode of the proposed UWB outlier rejection algorithm is provided in Algorithm 1.
Algorithm 1 VIO-Constrained Multi-epoch Outlier Rejection
Input: UWB ranging data set within the sliding window D i , UAV poses estimated by VIO T
Input: RANSAC iteration number N i t e r , minimum sample size P, ranging residual threshold ϵ , minimum inlier count L
Output: Valid UWB ranging data set identified as inliers I f i n a l
  1:
k 0 , I f i n a l , ϵ optimal
  2:
for  k 1 to N i t e r  do
  3:
      D s a m p l e Randomly select P samples from D i
  4:
      p A i model Solve anchor position with LM using ( D s a m p l e , T )
  5:
      I temp
  6:
     for each measurement m D i  do
  7:
          r Calculate residual ( m , p A i model , T )
  8:
          ϵ better the sum of r
  9:
         if  r < ϵ  then
10:
             I temp I temp { m }
11:
         end if
12:
     end for
13:
     if  | I temp | > L  then
14:
         if  ϵ better < ϵ optimal  then
15:
             ϵ optimal ϵ better
16:
             I f i n a l I temp
17:
         end if
18:
     end if
19:
end for
20:
return  I f i n a l

2.2. ESKF-Based Multi-Sensor Fusion Localization Algorithm

By linearizing the error state on the Lie algebra, the ESKF significantly reduces the impact of nonlinear effects on filtering performance while maintaining low computational complexity, making it suitable for real-time UAV localization. Although the UWB system can provide position measurements of the UAV, accurate attitude perception during flight is equally crucial for precise localization. The IMU supplies high-frequency angular velocity and acceleration measurements, capturing rapid dynamic changes of the UAV; however, its measurement errors accumulate over time, leading to drift. VIO provides relative pose estimates using visual information but is prone to failure in environments with poor texture or severe illumination variations. The TFmini offers stable altitude information, compensating for VIO’s insufficient accuracy along the vertical axis. To efficiently fuse multi-source observation information, this paper proposes a tightly coupled ESKF (TC-ESKF) algorithm that integrates measurements from IMU, UWB, VIO, and TFmini to achieve accurate state estimation and error correction.

2.2.1. State Definition and Error Modeling

Within the ESKF framework, the system state is partitioned into the nominal state, the error state, and the true state. The nominal state evolves on the manifold according to the nonlinear dynamics (position and velocity lie in Euclidean space, while attitude lies on SO ( 3 ) ); the error state is modeled in the tangent space of the manifold and is kept small to enable first-order linearization and Kalman updates; the true state is obtained by composing the nominal state with the error state, where position and velocity use additive composition, and attitude uses left-multiplication of quaternions. This paper represents attitude with quaternions, models the attitude error by a 3D small-angle vector δ θ , and enforces quaternion normalization and error reset after error injection to ensure numerical stability. The reference frames are defined as follows: position p , velocity v , and gravity g are expressed in the world frame w; the attitude quaternion q denotes the rotation from the body frame b to the world frame w (with R ( q ) the rotation matrix corresponding to q ); the gyroscope and accelerometer biases b g and b a are defined in the body frame. Based on these conventions, the nominal state vector is defined as
x = p v q b a b g R 16
where p R 3 is position, v R 3 is velocity, q H is the attitude quaternion, and b a , b g R 3 are the accelerometer and gyroscope biases, respectively.
The corresponding error state vector is
δ x = δ p δ v δ θ δ b a δ b g R 15
where the attitude error is represented by a small-angle rotation vector δ θ . This vector is converted into an error quaternion δ q to be composed with the nominal quaternion. The first-order approximation of δ q is δ q 1 ( δ θ / 2 ) . The true state is described by the composition of the nominal and error states, with the relationship
x t = p + δ p v + δ v q δ q b a + δ b a b g + δ b g
where ⊗ denotes quaternion multiplication. This modeling approach enables the filter to maintain high numerical stability under nonlinear conditions.

2.2.2. IMU Motion Model and State Propagation

The IMU measures angular velocity ω m and specific force a m through the gyroscope and accelerometer, respectively. These measurements are affected by biases and random noise. The IMU motion model is given in (9):
ω m = ω + b g + n g , a m = R ( q ) ( a g ) + b a + n a
Here, n a and n g denote the accelerometer and gyroscope measurement noises, both modeled as zero-mean Gaussian distributions n a N ( 0 , σ a 2 · I 3 × 3 ) and n g N ( 0 , σ g 2 · I 3 × 3 ) . The accelerometer and gyroscope biases, b a and b g , follow random-walk processes, therefore, b ˙ g = n b g and b ˙ a = n b a , where n b g N ( 0 , σ b g 2 · I 3 × 3 ) and n b a N ( 0 , σ b a 2 · I 3 × 3 ) . The gravity vector is g = [ 0 , 0 , 9.81 ] . Whenever new IMU measurements arrive, they are used to update the nominal state. Consequently, the continuous-time propagation of the nominal state is
p ˙ = v
v ˙ = R ( q ) ( a m b a ) + g
q ˙ = 1 2 Ω ( ω m b g ) q
b ˙ a = 0 , b ˙ g = 0
where
Ω ( ω ) = 0 ω ω [ ω ] × , [ ω ] × = 0 ω z ω y ω z 0 ω x ω y ω x 0 .
The true state and the nominal state satisfy the same dynamic model; the difference is that the true state includes sensor noise terms, whereas the nominal state is driven only by ideal, noise-free measurements. By comparing the evolution of the true and nominal states, we obtain the continuous-time error-state equations. By performing first-order linearization and discretization of the error-state equations, we arrive at the discrete-time error-state propagation model as shown in (15):
δ x k + 1 = δ p k + 1 δ v k + 1 δ θ k + 1 δ b a , k + 1 δ b g , k + 1 = δ p k + δ v k Δ t δ v k + R ( q b k n ) [ a m , k b a , k ] × δ θ k R ( q b k n ) δ b a , k Δ t + n v , k R ( q b k n ) ( ω m , k b g , k ) Δ t + δ θ k δ b g , k Δ t + n ω , k δ b a , k + n b a , k δ b g , k + n b g , k
The propagation process includes recursive updates of both the error state and its covariance. The complete discrete-time propagation is therefore given by (16):
δ x k + 1 = F k δ x k + + G k w k P k + 1 = F k P k + F k + G k Q k G k
where F k is the state transition matrix, G k is the process noise input matrix, w k is the process noise vector, and Q k is the process noise covariance. Their specific forms are given in (17):
F k = I 3 × 3 Δ t · I 3 × 3 0 0 0 0 I 3 × 3 R ( q b k n ) [ a m , k b a , k ] × Δ t R ( q b k n ) Δ t 0 0 0 R ( q b k n ) ( ω m , k b g , k ) Δ t 0 Δ t · I 3 × 3 0 0 0 I 3 × 3 0 0 0 0 0 I 3 × 3 G k = 0 0 0 0 I 3 × 3 0 0 0 0 I 3 × 3 0 0 0 0 I 3 × 3 0 0 0 0 I 3 × 3 w k = n v , k n ω , k n b a , k n b g , k Q k = σ a 2 Δ t 2 · I 3 × 3 0 0 0 0 σ g 2 Δ t 2 · I 3 × 3 0 0 0 0 σ b a 2 Δ t · I 3 × 3 0 0 0 0 σ b g 2 Δ t · I 3 × 3

2.2.3. Multi-Sensor Observation Model

During the measurement update stage, this paper fuses observations from UWB, VIO, and TFmini. For UWB, we directly use raw ranging data as the filter’s measurement input rather than first computing positions and then using them as measurements. This tightly coupled strategy avoids information loss and error magnification caused by intermediate processing. For the camera, the pose provided by VIO is directly used as a measurement to the filter. The TFmini ranging measurement serves as a direct observation of altitude, adding a height constraint that compensates for the limited vertical accuracy of UWB and VIO. By fusing these observations, the filter fully leverages the strengths of each sensor to achieve more accurate and robust state estimation.
Assume that at time k, the available observations include M UWB ranges z UWB = [ z 1 , z 2 , , z M ] , the VIO-provided UAV pose z VIO , and the TFmini range z tf . The relationship between these observations and the system state is described by a nonlinear measurement function h ( x ) . The specific observation models are as follows:
For UWB observations, the localization system comprises several anchors and a UWB tag mounted on the UAV. Let the position and attitude of the body frame b in the world frame be p b w R 3 and R b w S O ( 3 ) , respectively. The UWB tag’s position in the body frame is t u b R 3 . Then, the antenna phase center in the world frame is
p u w = p b w + R b w t u b .
Accordingly, the ideal range between the i-th anchor and the UAV-mounted UWB antenna is the Euclidean distance between them:
z ^ i = p u w p A i w .
Accounting for measurement noise, the actual observation is modeled as
z i = h i ( x ) + n i = p b w + R b w t u b p A i w + n i ,
where h i ( x ) is the nonlinear measurement function and n i is zero-mean Gaussian noise, i.e., n i N ( 0 , σ uwb 2 ) . In practical UWB ranging scenarios, however, the Gaussian assumption can be violated due to NLOS propagation and multipath effects, and several works have reported heavy-tailed or skewed error distributions and advocated non-Gaussian or robust noise models for UWB measurements [41]. For simplicity and clarity of the formulation, we do not adopt such non-Gaussian noise models in this work.
To enable updates within the filtering framework, we linearize the measurement equation. Define
d i = p u w p A i w , ρ i = d i , e ^ i = d i ρ i .
Then the UWB residual can be approximated as
r i = z i z ^ i H i δ x + n i ,
with the measurement Jacobian
H i = e ^ i 0 1 × 3 e ^ i R b w [ t u b ] × 0 1 × 6 .
Here, e ^ i R 3 is the unit direction vector from the UAV to the anchor, and [ t u b ] × is the skew-symmetric matrix of t u b . The Jacobian shows that UWB observations are sensitive to position and attitude errors, but insensitive to velocity and IMU biases.
The VIO output used in this work comes from the open-source project VINS-Fusion [42], which provides the pose of the body frame b in the world frame w. Within our filtering framework, we use only the VIO-provided position measurement p b , meas w as a direct constraint on the UAV position.
The VIO position measurement is modeled as
z vio = p b , meas w = h p ( x ) + n p ,
where
h p ( x ) = p b w ,
and n p N ( 0 , R p , vio ) is the position measurement noise.
The position residual is defined as
r p = p b , meas w p ^ b w , p ^ b w = p b w .
Linearizing around the nominal state x ¯ yields
r p H VIO δ x + n p , n p N ( 0 , R p , vio ) .
The corresponding measurement Jacobian for the position residual is
H vio = I 0 0 0 0 .
From the Jacobian, it is evident that the VIO-provided position information is sensitive only to the position error δ p , and is insensitive to velocity, attitude, and IMU biases.
TFmini is a laser range sensor whose measurement can be modeled as the distance, in the sensor coordinate frame, from the sensor optical center along a given ray to the geometric intersection with the ground plane in the world frame. Let s denote the TFmini coordinate frame. The extrinsic calibration of TFmini with respect to the body frame is ( t s b , R s b ) . Denote by u s the unit direction vector of the TFmini beam in the sensor frame; its representations in the body and world frames are, respectively,
u b = R s b u s , u w = R b w u b .
The sensor optical center in the world frame is
p s w = p b w + R b w t s b .
Approximate the ground as a plane
Π : n x + d = 0 , n = 1 ,
where n is the plane normal expressed in the world frame and d is the plane offset. The ideal TFmini measurement is defined as the signed distance z ^ from the optical center p s w along direction u w to the intersection with the plane Π . Thus, the predicted measurement is
z ^ = n p s w d n u w = n p b w + R b w t s b d n R b w u b .
Let (32) be the nonlinear measurement function h tf ( x ) . The TFmini measurement model is then
z tf = h tf ( x ) + n tf , n tf N ( 0 , σ tf 2 ) ,
where σ tf depends on surface reflectance, range, and attitude.
Linearizing h tf ( x ) to first order around the nominal state x ¯ , we construct the residual
r tf = z tf h tf ( x ¯ ) H TF δ x + n tf .
For Jacobian derivation, define
α n u w = n R b w u b , β n p s w + d ,
so that z ^ = β / α . Taking derivatives of the main blocks yields
z ^ p b w = n α ,
z ^ δ θ = 1 α n R b w [ t s b ] × + β α 2 n R b w [ u b ] × ,
z ^ v b w = 0 1 × 3 , z ^ b g = 0 1 × 3 , z ^ b a = 0 1 × 3 .
Therefore, the TFmini measurement Jacobian is
H TF = n α 0 1 × 3 n R b w [ t s b ] × α + β n R b w [ u b ] × α 2 0 1 × 3 0 1 × 3 .
The TFmini altitude measurement directly constrains position (along n ) and is sensitive to attitude errors through the lever arm t s b and the beam direction u b ; it is insensitive to velocity and IMU biases (the corresponding columns are zero). In practice, the ground is often approximated as horizontal and the beam as approximately pointing vertically downward, i.e.,
n = [ 0 , 0 , 1 ] , d = 0 , u b [ 0 , 0 , 1 ] .
For convenience, let e z = [ 0 , 0 , 1 ] denote the vertical unit vector in the world frame. Then
α e z R b w e z , β e z p b w + R b w t s b .
To reduce complexity, neglecting attitude effects yields a simpler Jacobian approximation:
H TF e z α 0 0 0 0 .

2.2.4. Filter Update and Error Injection

In the update stage, the observations from UWB, VIO, and TFmini are written in a unified measurement equation:
z = h ( x ) + n , n N ( 0 , R ) ,
where z is the sensor measurement vector, h ( · ) is the measurement function, and R is the measurement noise covariance. Based on the nominal state x ¯ , the predicted measurement is z ^ = h ( x ¯ ) , and the residual is computed as
r = z z ^ .
Using the ROS timestamps, all measurement residuals are interpolated to the UWB measurement times and then stacked as
r = r uwb r vio r tf , H = H uwb H vio H tf , R = blkdiag R uwb , R vio , R tf ,
where H is the measurement Jacobian, with row blocks corresponding to the linearization of different sensors. UWB measurements are sensitive mainly to position, VIO is sensitive to position and attitude, and TFmini is primarily sensitive to altitude and attitude; thus H exhibits a sparse structure.
According to the ESKF update equations, the Kalman gain is computed as
K = P H H P H + R 1 .
Using this gain, the error state is updated to obtain the minimum mean-square estimate:
δ x ^ = K r ,
where r is the measurement residual. Physically, K reflects the balance between the prior uncertainty P and the measurement reliability R .
Meanwhile, the error-state covariance must be updated to reflect the new uncertainty level:
P I K H P I K H + K R K .
In practice, a reduced form is often used
P I K H P
to reduce computational complexity and improve numerical stability.
Next, the estimated error is injected into the nominal state:
p p + δ p ^ ,
v v + δ v ^ ,
q q exp 1 2 δ θ ^ ,
b a b a + δ b ^ a , b g b g + δ b ^ g ,
where the attitude is updated by right-multiplying a perturbation; exp ( · ) denotes the exponential map. Unlike direct additive updates on Euler angles, this approach avoids singularities and preserves the unit-norm constraint of the quaternion.
To re-define the error state as zero-mean after the update, an error reset is performed. The covariance must be corrected using the reset Jacobian G r :
P G r P G r .
Here, G r equals the identity for the position, velocity, and bias blocks, while for the attitude block it is approximated by I 1 2 [ δ θ ^ ] × , reflecting the nonlinearity of the special orthogonal group.

3. Experimental Validation

We built a UAV platform and conducted multiple experiments in an underground parking garage. The experimental area measured approximately 7 m × 7 m × 3 m , and the overall setup is shown in Figure 3. The UAV was equipped with an IMU (TDK InvenSense Inc., San Jose, CA, USA), a UWB ranging module (Shenzhen Nooploop Technology Co., Ltd., Shenzhen, China), a TFmini laser rangefinder (Benewake Beijing Co., Ltd., Beijing, China), a RealSense D435i camera, (Intel Corporation, Santa Clara, CA, USA) and a MID360 LiDAR (Livox Technology Co., Ltd., Shenzhen, China). Data acquisition and processing were performed onboard by an Intel NUC computer (Intel i5-8400 CPU @ 2.80 GHz, 16 GB RAM, Santa Clara, CA, USA). The sensor configurations were as follows: the UWB module (Nooploop LinkTrack P) measured distances from the tag to anchors at 10 Hz; the IMU (ICM-42688-P) output accelerations and angular velocities at 50 Hz; Visual–Inertial Odometry (VIO) observations were generated using the VINS-Fusion algorithm at 15 Hz; the LiDAR ran the Fast-LIO2 algorithm [43] to provide real-time poses at 20 Hz and served as the system ground truth for accuracy evaluation; the TFmini measured UAV altitude at 50 Hz. Since the extrinsic parameters of the UWB and TFmini have minimal impact on position estimation, their mounting offsets can be neglected in practical applications. Four UWB anchors were deployed inside the experimental area, located at its four corners, with coordinates p A 1 w = ( 0 , 0 , 0.81 ) , p A 2 w = ( 0 , 4.654 , 0 ) , p A 3 w = ( 3.823 , 4.795 , 0.856 ) , and p A 4 w = ( 3.81 , 0 , 2.211 ) .

3.1. Comparative Experiments

To rigorously validate the effectiveness and robustness of the proposed algorithm, we conducted multiple comparative experiments in real-world environments. Under identical hardware platforms, scenes, and motion paths, we evaluated seven representative localization methods: Linear Least Squares (LS), Nonlinear Least Squares (NLS), Particle Filter (PF), Loosely Coupled Extended Kalman Filter (EKF-LC), Loosely Coupled ESKF (ESKF-LC), VIO, and the proposed Tightly Coupled ESKF (ESKF-TC). To ensure fairness and reproducibility, all methods were run independently multiple times on the same dataset. The experimental design included five canonical trajectories: circular, rectangular, square, triangular, and rhombic. All trajectories commenced from the same initial position and attitude, followed predetermined paths, and ended with landing at the final waypoint; the flight altitude was approximately constant throughout. During the experiments, sensor data were recorded uniformly via ROS, and data processing and algorithm execution were performed offline. For evaluation, we used the high-accuracy trajectory generated by LiDAR odometry as ground truth (LIO-GT) and employed the Absolute Position Error (APE) to quantitatively compare localization accuracy and stability across methods. Due to space limitations, we present and analyze only the rectangular and circular trajectories; the quantitative metrics and trends for the remaining three trajectories (square, triangular, rhombic) are consistent with those reported.
Figure 4 presents a comparison of 3D localization trajectories and axis-wise position–time curves for all methods. Specifically, Figure 4a,c show the 3D trajectories for the rectangular and circular paths, respectively, while Figure 4b,d provide the corresponding axis-wise position–time plots. For clarity, different colors and line styles are used to distinguish methods: Raw denotes results without outlier rejection, OJ denotes results with outlier rejection, and LIO-GT denotes the Fast-LIO2 result, which serves as the ground truth in this paper. From the 3D trajectory comparisons, OURS (ESKF-TC) is overall the closest to LIO-GT, maintaining good agreement even at turning points and during intervals with large velocity changes. In contrast, PF-OJ deviates the most from LIO-GT, exhibiting pronounced cumulative drift. Further inspection of the axis-wise curves in Figure 4b reveals that fluctuations along the (z) axis are generally larger than those along the (x/y) axes across methods, indicating that vertical errors have a more significant impact on overall localization accuracy. By comparison, OURS exhibits lower instantaneous fluctuations and long-term drift on all three axes; in particular, along the (z) axis, thanks to TFmini’s direct altitude constraint and the tightly coupled modeling that exploits measurement consistency, error accumulation and fluctuations are markedly suppressed. Overall, the proposed ESKF-TC demonstrates higher accuracy and robustness through multi-source fusion (UWB+VIO+TFmini).
Figure 5 shows the absolute position error (APE) curves over time for each algorithm, along with the smoothed error (moving average) and its relationship with time, including the 1 σ confidence bounds. From the figure, it can be seen that OURS maintains the lowest error level throughout, with its 1 σ confidence interval significantly converging and being the narrowest. These results indicate that OURS, when fusing UWB, VIO, and TFmini observations, achieves higher accuracy and stability, effectively suppressing error accumulation and short-term fluctuations.
Figure 6 shows the cumulative distribution function (CDF) of the APE for each algorithm: the horizontal axis represents the error threshold, and the vertical axis represents the probability of error not exceeding that threshold. Generally, the curve positioned further to the left and rising faster indicates smaller errors at most times and higher overall accuracy. From the figure, it can be seen that the CDF curve of OURS is consistently positioned to the upper-left of the other methods for both trajectory types, reflecting a significant accuracy advantage. For quantitative comparison, we use the 68.3 % quantile error (approximately 1 σ ) as the evaluation metric: for the circular trajectory, the error for OURS is 0.1077 m , better than the second-best PF-OJ ( 0.1182 m ), representing an approximately 8.9 % reduction; for the rectangular trajectory, OURS has an error of 0.1076 m , better than the second-best LS-OJ ( 0.1501 m ), representing an approximately 28.3 % reduction. Furthermore, VIO and several loosely coupled/single-observation variants exhibit significantly larger 1 σ position errors (e.g., for VIO on the circular trajectory, 0.2473 m , and for ESKF-LC-Raw on the rectangular trajectory, 0.1699 m ), indicating that missing other sensor observations or failing to suppress outliers limits both the leftward shift and steepness of the distribution. In summary, OURS shifts the error distribution leftward, significantly improving steepness, achieving lower typical errors, and demonstrating stronger robustness across different trajectory shapes.
As shown in Table 1, we systematically compare the APE of each algorithm on two trajectory types (Square and Circular) using RMSE, Mean, Std, Min, Max, and Median. Overall, OURS achieves the lowest RMSE, Mean, and Std on both trajectories, and is also best on Min and Median for the Square trajectory, as well as Max for the Circular trajectory. The few exceptions are as follows: the Square-Max is achieved by VIO with a value of 0.3174; the Circular-Min is achieved by ESKF-LC-Raw with 0.0035; and the Circular-Median is achieved by PF-OJ with 0.0786. In terms of magnitude, for the Square trajectory, OURS reduces RMSE from 0.1402 to 0.0972 relative to the second-best baseline ESKF-LC-Raw (a reduction of about 30.7%), Mean from 0.1212 to 0.0864 (about 28.7%), and Std from 0.0704 to 0.0446 (about 36.7%). For the Circular trajectory, OURS reduces RMSE from 0.1434 to 0.0944 relative to the best baseline LS-OJ (about 34.2%), Mean from 0.1246 to 0.0863 (about 30.7%), and reduces Std from 0.0624 to 0.0382 relative to ESKF-LC-Raw (about 38.8%). Although certain methods are superior on a single statistic (e.g., VIO for Square-Max, ESKF-LC-Raw for Circular-Min, PF-OJ for Circular-Median), their overall error levels and dispersion are clearly worse than those of OURS. These results indicate that introducing outlier rejection and tightly coupled ESKF fusion of multi-source observations effectively improves measurement quality and fusion stability, thereby achieving higher localization accuracy and robustness across different motion trajectories.

3.2. Ablation Study

To verify the effectiveness of the key modules in the proposed algorithmic framework and to assess the impact of potential module failures [44], we conducted an ablation study for comparative analysis. The evaluated variants include: a tightly coupled fusion method without UWB outlier rejection but fusing VIO and TFmini (TC Both-RAW); a tightly coupled method with outlier rejection but without the VIO position constraint (TC TFmini-OJ); a tightly coupled method with outlier rejection but without the TFmini altitude constraint (TC VIO-OJ); and a tightly coupled UWB–IMU fusion method with outlier rejection (TC UWB-OJ). Finally, we compare all variants against the proposed tightly coupled fusion method integrating IMU, UWB, VIO, and TFmini (OURS). All methods were tested through multiple independent trials under identical experimental environments, sensor configurations, and motion trajectories. Except for the ablated modules, the remaining implementation details and parameter settings were kept consistent to ensure fair comparison.
Figure 7 compares the 3D localization trajectories and the axis-wise position–time curves of all methods. Specifically, Figure 7a and Figure 7c show the 3D trajectories for the rectangular and circular paths, respectively, while Figure 7b and Figure 7d present the corresponding axis-wise position–time curves. It can be observed that OURS achieves the best localization accuracy among all methods in the ablation study, with trajectories closest to the LiDAR ground truth (LIO-GT). In contrast, removing the TFmini altitude constraint or the RANSAC outlier rejection module leads to noticeably larger deviations from ground truth, especially in regions with altitude changes or where UWB outliers are present, resulting in more pronounced error fluctuations. Moreover, methods that rely solely on UWB or VIO produce trajectories with substantial deviations and exhibit significant instability and drift. Overall, by means of tightly coupled multi-sensor fusion and robust outlier handling, OURS effectively improves localization accuracy and robustness, enabling high-precision UAV localization in complex environments.
Figure 8 presents the 3D position error profiles for all methods on both the rectangular and circular trajectory experiments. For each trajectory, the left subfigure plots the instantaneous absolute position error (i.e., the Euclidean distance between the estimated position and the ground truth) as a function of time, while the right subfigure shows the corresponding temporally smoothed error curve together with the 1 σ band computed over the entire trajectory. As seen in the figure, OURS exhibits the lowest error curve and the narrowest confidence interval among all ablation variants, indicating superior accuracy and stability. In comparison, removing the TFmini altitude constraint or the RANSAC outlier rejection module raises the overall error level and enlarges the fluctuation range, with more pronounced peaks in segments featuring altitude changes or UWB outliers. Additionally, methods relying solely on UWB or VIO show large fluctuations and strong instability and drift. Overall, through tightly coupled fusion and robust outlier rejection, OURS effectively suppresses error accumulation and abrupt changes, significantly enhancing localization accuracy and robustness.
Figure 9 presents the CDF of absolute errors for each method. In the ablation study, the CDF curve of OURS consistently lies to the upper-left of the other methods, indicating lower absolute position errors across all trajectory shapes. Quantitatively, at the 1 σ position (approximately the 68 % quantile), OURS attains threshold errors of 0.1076 and 0.1077 for the rectangular and circular trajectories, respectively—the smallest among all methods; in contrast, TC UWB-OJ yields the largest thresholds, 0.3114 (rectangular) and 0.1342 (circular). Aggregating statistics across different trajectory shapes, OURS achieves smaller errors for the vast majority of time, significantly outperforming the other ablation variants and reflecting higher accuracy. These results demonstrate that OURS, via tightly coupled multi-sensor fusion and robust outlier rejection, effectively improves localization accuracy and stability, enabling high-precision UAV localization in complex environments.
As shown in Table 2, we compare the APE statistics of all methods on two trajectory types (Square and Circular). OURS achieves the lowest RMSE and Mean on both trajectories and also the smallest Median; its Std is near-optimal (Square: 0.0446, Circular: 0.0382), slightly higher than TC VIO-OJ (0.0411/0.0362), indicating a low overall dispersion of errors. Although TC Both-RAW yields smaller maximum errors in the Max metric (Square: 0.2560; Circular: 0.2019), its RMSE/Mean is overall worse than OURS due to its susceptibility to abnormal UWB ranges without outlier rejection. The TC UWB-OJ (with outlier rejection enabled) exhibits notably larger RMSE and Std on both trajectories, indicating difficulty in suppressing cumulative drift and fluctuations without visual and altitude constraints. Variants that remove a single observation, TC TFmini-OJ and TC VIO-OJ, may be better on individual statistics (e.g., Min/Std), but their overall accuracy and stability remain inferior to OURS. In terms of quantitative gains, relative to TC Both-RAW (no outlier rejection), OURS reduces RMSE by about 4.0%/1.1% on Square/Circular trajectories; relative to TC TFmini-OJ/TC VIO-OJ, OURS improves RMSE by about 10.2%/23.4% (Square) and 4.6%/11.2% (Circular). These results indicate that multi-epoch outlier rejection and tightly coupled ESKF fusion of IMU/UWB/VIO/TFmini effectively enhance measurement quality and fusion stability, significantly reducing localization errors and their dispersion.

4. Conclusions

This paper targets GNSS-denied, heavily occluded indoor environments and proposes a tightly coupled multi-sensor localization framework based on the ESKF. The core idea is to drive state prediction with IMU motion model, incorporate raw UWB ranges, VIO relative poses, and TFmini altitude jointly in the measurement update, and apply a VIO-constrained multi-epoch outlier rejection to perform geometric consistency screening of UWB measurements, thereby suppressing NLOS-induced outliers at the source. Real-world experiments in an underground parking garage show that, on rectangular and circular trajectories, the proposed method attains the best performance on typical metrics (Square: 0.0972/0.0864/0.0446 for RMSE/Mean/Std; Circular: 0.0944/0.0863/0.0382), with CDF error thresholds at the 68.3% quantile of 0.1076/0.1077 m, and curves that are overall “more left and steeper,” indicating smaller errors and stronger stability for the vast majority of time. Ablation results further validate the necessity of each module: removing the altitude constraint or outlier rejection yields noticeably larger errors and fluctuations; relying only on IMU-UWB leads to more pronounced cumulative drift. Overall, the synergy of tightly coupled fusion and robust preprocessing enables high-frequency, accurate, and robust localization under multi-source asynchrony, multipath interference, and locally degraded conditions.

Author Contributions

Conceptualization, Z.D. and E.H.; methodology, B.L. and Y.L.; software, J.Z.; validation, J.Z.; formal analysis, J.Z.; investigation, J.Z.; resources, Z.D. and E.H.; data curation, J.Z. and W.S.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z.; visualization, B.L., W.S. and Y.L.; supervision, Z.D. and E.H.; project administration, Z.D.; funding acquisition, Z.D. and E.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [Grant numbers: 6220020330].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, H.; Long, Q.; Yi, B.; Jiang, W. A survey of sensors based autonomous unmanned aerial vehicle (UAV) localization techniques. Complex Intell. Syst. 2025, 11, 371. [Google Scholar] [CrossRef]
  2. Lin, H.Y.; Zhan, J.R. GNSS-denied UAV indoor navigation with UWB incorporated visual inertial odometry. Measurement 2023, 206, 112256. [Google Scholar] [CrossRef]
  3. Kramarić, L.; Jelušić, N.; Radišić, T.; Muštra, M. A Comprehensive Survey on Short-Distance Localization of UAVs. Drones 2025, 9, 188. [Google Scholar] [CrossRef]
  4. Pang, S.; Zhang, B.; Lu, J.; Pan, R.; Wang, H.; Wang, Z.; Xu, S. Application of IMU/GPS Integrated Navigation System Based on Adaptive Unscented Kalman Filter Algorithm in 3D Positioning of Forest Rescue Personnel. Sensors 2024, 24, 5873. [Google Scholar] [CrossRef] [PubMed]
  5. Sun, Z.; Gao, W.; Tao, X.; Pan, S.; Wu, P.; Huang, H. Semi-tightly coupled robust model for GNSS/UWB/INS integrated positioning in challenging environments. Remote Sens. 2024, 16, 2108. [Google Scholar] [CrossRef]
  6. Mascaro, R.; Teixeira, L.; Hinzmann, T.; Siegwart, R.; Chli, M. Gomsf: Graph-optimization based multi-sensor fusion for robust uav pose estimation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1421–1428. [Google Scholar]
  7. Wang, Y.; Cheng, H.; Meng, M.Q.H. A Learning-Based Sequence-to-Sequence WiFi Fingerprinting Framework for Accurate Pedestrian Indoor Localization Using Unconstrained RSSI. IEEE Internet Things J. 2025, 12, 36765–36777. [Google Scholar] [CrossRef]
  8. Li, Z.; Zhang, Y. Constrained ESKF for UAV positioning in indoor corridor environment based on IMU and WiFi. Sensors 2022, 22, 391. [Google Scholar] [CrossRef]
  9. Ayub, A.; Abidin, Z.Z.; Alhammadi, A.; Soliman, N.F.; Khan, M.A.; Ghazali, N.B. Comparative Analysis of Machine Learning Algorithms for BLE-Based Indoor Localization System. IEEE Access 2025, 13, 167120–167138. [Google Scholar] [CrossRef]
  10. Bellili, F.; Amor, S.B.; Affes, S.; Ghrayeb, A. Maximum likelihood joint angle and delay estimation from multipath and multicarrier transmissions with application to indoor localization over IEEE 802.11 ac radio. IEEE Trans. Mob. Comput. 2018, 18, 1116–1132. [Google Scholar] [CrossRef]
  11. Bazzi, A.; Slock, D.T.; Meilhac, L. Efficient maximum likelihood joint estimation of angles and times of arrival of multiple paths. In Proceedings of the 2015 IEEE Globecom Workshops (GC Wkshps), San Diego, CA, USA, 6–10 December 2015; pp. 1–7. [Google Scholar]
  12. Abdelkhalek, M.; Ben Amor, S.; Affes, S. Data-Aided Maximum Likelihood Joint Angle and Delay Estimator Over Orthogonal Frequency Division Multiplex Single-Input Multiple-Output Channels Based on New Gray Wolf Optimization Embedding Importance Sampling. Sensors 2024, 24, 5821. [Google Scholar] [CrossRef]
  13. Bazzi, A.; Slock, D.T.; Meilhac, L. Sparse recovery using an iterative variational Bayes algorithm and application to AoA estimation. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Washington, DC, USA, 12–14 December 2016; pp. 197–202. [Google Scholar]
  14. Bazzi, A.; Slock, D.T.; Meilhac, L. JADED-RIP: Joint angle and delay estimator and detector via rotational invariance properties. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Washington, DC, USA, 12–14 December 2016; pp. 160–165. [Google Scholar]
  15. Li, J.; Wang, S.; Hao, J.; Ma, B.; Chu, H.K. UVIO: Adaptive Kalman Filtering UWB-Aided Visual-Inertial SLAM System for Complex Indoor Environments. Remote Sens. 2024, 16, 3245. [Google Scholar] [CrossRef]
  16. Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [Google Scholar] [CrossRef]
  17. Su, W.; Deng, Z. Online Temporal Calibration for Relative Transformation Estimation Systems. IEEE Robot. Autom. Lett. 2025, 10, 4444–4451. [Google Scholar] [CrossRef]
  18. Tian, Q.; Kevin, I.; Wang, K.; Salcic, Z. A low-cost INS and UWB fusion pedestrian tracking system. IEEE Sens. J. 2019, 19, 3733–3740. [Google Scholar] [CrossRef]
  19. Xu, Y.; Wan, D.; Bi, S.; Guo, H.; Zhuang, Y. A FIR filter assisted with the predictive model and ELM integrated for UWB-based quadrotor aircraft localization. Satell. Navig. 2023, 4, 2. [Google Scholar] [CrossRef]
  20. Yuan, S.; Lou, B.; Nguyen, T.M.; Yin, P.; Cao, M.; Xu, X.; Li, J.; Xu, J.; Chen, S.; Xie, L. Large-scale uwb anchor calibration and one-shot localization using gaussian process. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; pp. 3132–3138. [Google Scholar]
  21. Nguyen, T.M.; Yuan, S.; Cao, M.; Lyu, Y.; Nguyen, T.H.; Xie, L. Ntu viral: A visual-inertial-ranging-lidar dataset, from an aerial vehicle viewpoint. Int. J. Robot. Res. 2022, 41, 270–280. [Google Scholar] [CrossRef]
  22. Zhang, T.; Yuan, M.; Wei, L.; Wang, Y.; Tang, H.; Niu, X. MR-ULINS: A Tightly-Coupled UWB-LiDAR-Inertial Estimator with Multi-Epoch Outlier Rejection. IEEE Robot. Autom. Lett. 2024, 9, 11786–11793. [Google Scholar] [CrossRef]
  23. Fan, M.; Li, J.; Wang, W. An IMU/UWB tightly coupled navigation algorithm to improve positioning accuracy under large-scale NLOS conditions. Meas. Sci. Technol. 2025, 36, 045105. [Google Scholar] [CrossRef]
  24. Li, X.; Wang, Y.; Khoshelham, K. A robust and adaptive complementary Kalman filter based on Mahalanobis distance for ultra wideband/inertial measurement unit fusion positioning. Sensors 2018, 18, 3435. [Google Scholar] [CrossRef]
  25. Li, M.G.; Zhu, H.; You, S.Z.; Tang, C.Q. UWB-based localization system aided with inertial sensor for underground coal mine applications. IEEE Sens. J. 2020, 20, 6652–6669. [Google Scholar] [CrossRef]
  26. Wang, L.; Zhang, S.; Qi, J.; Chen, H.; Yuan, R. Research on IMU-assisted UWB-based positioning algorithm in underground coal mines. Micromachines 2023, 14, 1481. [Google Scholar] [CrossRef]
  27. Stahlke, M.; Kram, S.; Mutschler, C.; Mahr, T. NLOS Detection using UWB Channel Impulse Responses and Convolutional Neural Networks. In Proceedings of the ICL-GNSS, Tampere, Finland, 2–4 June 2020; pp. 1–6. [Google Scholar]
  28. Pei, Y.; Chen, R.; Li, D.; Xiao, X.; Zheng, X. FCN-Attention: A deep learning UWB NLOS/LOS classification algorithm using fully convolution neural network with self-attention mechanism. Geo-Spat. Inf. Sci. 2024, 27, 1162–1181. [Google Scholar] [CrossRef]
  29. Wang, K.; Yang, C. Analysis of Machine Learning-Based NLOS Signal Identification Algorithm for UWB Indoor Localization Using CIR Waveform Features. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 705–710. [Google Scholar] [CrossRef]
  30. Ji, P.; Duan, Z.; Xu, W. A combined UWB/IMU localization method with improved CKF. Sensors 2024, 24, 3165. [Google Scholar] [CrossRef] [PubMed]
  31. Liu, F.; Li, X.; Wang, J.; Zhang, J. An adaptive UWB/MEMS-IMU complementary kalman filter for indoor location in NLOS environment. Remote Sens. 2019, 11, 2628. [Google Scholar] [CrossRef]
  32. Liu, J.; Gao, Z.; Li, Y.; Lv, S.; Liu, J.; Yang, C. Ranging Offset Calibration and Moving Average Filter Enhanced Reliable UWB Positioning in Classic User Environments. Remote Sens. 2024, 16, 2511. [Google Scholar] [CrossRef]
  33. Feng, D.; Wang, C.; He, C.; Zhuang, Y.; Xia, X.G. Kalman-filter-based integration of IMU and UWB for high-accuracy indoor positioning and navigation. IEEE Internet Things J. 2020, 7, 3133–3146. [Google Scholar] [CrossRef]
  34. You, W.; Li, F.; Liao, L.; Huang, M. Data fusion of UWB and IMU based on unscented Kalman filter for indoor localization of quadrotor UAV. IEEE Access 2020, 8, 64971–64981. [Google Scholar] [CrossRef]
  35. Marković, L.; Kovač, M.; Milijas, R.; Car, M.; Bogdan, S. Error state extended kalman filter multi-sensor fusion for unmanned aerial vehicle localization in gps and magnetometer denied indoor environments. In Proceedings of the 2022 International Conference on Unmanned Aircraft Systems (ICUAS), Dubrovnik, Croatia, 21–24 June 2022; pp. 184–190. [Google Scholar]
  36. Zheng, S.; Li, Z.; Liu, Y.; Zhang, H.; Zou, X. An optimization-based UWB-IMU fusion framework for UGV. IEEE Sens. J. 2022, 22, 4369–4377. [Google Scholar] [CrossRef]
  37. Fang, X.; Wang, C.; Nguyen, T.M.; Xie, L. Graph optimization approach to range-based localization. IEEE Trans. Syst. Man, Cybern. Syst. 2020, 51, 6830–6841. [Google Scholar] [CrossRef]
  38. Kang, J.; Park, K.; Arjmandi, Z.; Sohn, G.; Shahbazi, M.; Ménard, P. Ultra-wideband aided UAV positioning using incremental smoothing with ranges and multilateration. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 4529–4536. [Google Scholar]
  39. Song, Y.; Hsu, L.T. Tightly coupled integrated navigation system via factor graph for UAV indoor localization. Aerosp. Sci. Technol. 2021, 108, 106370. [Google Scholar] [CrossRef]
  40. Chen, Z.; Xu, A.; Sui, X.; Hao, Y.; Zhang, C.; Shi, Z. NLOS identification-and correction-focused fusion of UWB and LiDAR-SLAM based on factor graph optimization for high-precision positioning with reduced drift. Remote Sens. 2022, 14, 4258. [Google Scholar] [CrossRef]
  41. Mercorelli, P. A switching Kalman Filter for sensorless control of a hybrid hydraulic piezo actuator using MPC for camless internal combustion engines. In Proceedings of the 2012 IEEE International Conference on Control Applications, Dubrovnik, Croatia, 3–5 October 2012; pp. 980–985. [Google Scholar]
  42. Qin, T.; Cao, S.; Pan, J.; Shen, S. A general optimization-based framework for global pose estimation with multiple sensors. arXiv 2019, arXiv:1901.03642. [Google Scholar] [CrossRef]
  43. Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
  44. Mercorelli, P. Recent advances in intelligent algorithms for fault detection and diagnosis. Sensors 2024, 24, 2656. [Google Scholar] [CrossRef] [PubMed]
Figure 1. System overview of the proposed localization system.
Figure 1. System overview of the proposed localization system.
Sensors 25 07673 g001
Figure 2. Schematic diagram of the outlier rejection algorithm model.
Figure 2. Schematic diagram of the outlier rejection algorithm model.
Sensors 25 07673 g002
Figure 3. Underground parking garage experiment and platform: (a) experimental scene; (b) UAV platform.
Figure 3. Underground parking garage experiment and platform: (a) experimental scene; (b) UAV platform.
Sensors 25 07673 g003
Figure 4. 3D localization trajectories and axis-wise position curves for different algorithms: (a) rectangular trajectory; (b) axis-wise position curves of the circular trajectory; (c) circular trajectory; (d) axis-wise position curves of the circular trajectory.
Figure 4. 3D localization trajectories and axis-wise position curves for different algorithms: (a) rectangular trajectory; (b) axis-wise position curves of the circular trajectory; (c) circular trajectory; (d) axis-wise position curves of the circular trajectory.
Sensors 25 07673 g004
Figure 5. Localization error comparison for different trajectories: (a) rectangular trajectory localization error curve; (b) circular trajectory localization error curve.
Figure 5. Localization error comparison for different trajectories: (a) rectangular trajectory localization error curve; (b) circular trajectory localization error curve.
Sensors 25 07673 g005
Figure 6. CDF localization error comparison for different algorithms and trajectories: (a) rectangular trajectory CDF error curve; (b) circular trajectory CDF error curve.
Figure 6. CDF localization error comparison for different algorithms and trajectories: (a) rectangular trajectory CDF error curve; (b) circular trajectory CDF error curve.
Sensors 25 07673 g006
Figure 7. 3D localization trajectories and axis-wise position curves for different algorithms: (a) rectangular trajectory; (b) axis-wise position curves of the circular trajectory; (c) circular trajectory; (d) axis-wise position curves of the circular trajectory.
Figure 7. 3D localization trajectories and axis-wise position curves for different algorithms: (a) rectangular trajectory; (b) axis-wise position curves of the circular trajectory; (c) circular trajectory; (d) axis-wise position curves of the circular trajectory.
Sensors 25 07673 g007
Figure 8. Localization error comparison for different trajectories: (a) rectangular trajectory localization error curve; (b) circular trajectory localization error curve.
Figure 8. Localization error comparison for different trajectories: (a) rectangular trajectory localization error curve; (b) circular trajectory localization error curve.
Sensors 25 07673 g008
Figure 9. CDF localization error comparison for different algorithms and trajectories: (a) rectangular trajectory CDF error curve; (b) circular trajectory CDF error curve.
Figure 9. CDF localization error comparison for different algorithms and trajectories: (a) rectangular trajectory CDF error curve; (b) circular trajectory CDF error curve.
Sensors 25 07673 g009
Table 1. Localization Error Statistics Comparison of Different Algorithms on Various Trajectories.
Table 1. Localization Error Statistics Comparison of Different Algorithms on Various Trajectories.
MethodsSquare TrajectoryCircular Trajectory
RMSEMeanSthMinMaxMedianRMSEMeanSthMinMaxMedian
LS-Raw0.17360.14690.09260.01270.60760.12090.17090.14670.08780.00970.57560.1290
LS-OJ0.16170.13570.08800.01030.60190.11420.14340.12460.07100.01470.37640.1144
NLS-Raw0.25840.22280.13090.02590.77010.18560.22680.15760.16310.01141.15600.1068
NLS-OJ0.24680.21630.11880.02010.67200.18510.19150.13920.13150.01250.76750.1016
PF-OJ0.29670.27740.10530.06150.55690.26040.15470.11490.10360.00800.75220.0786
VIO0.16920.15190.07460.00580.31740.13350.24520.22390.09990.03960.41080.2270
EKF-LC-Raw0.30420.27440.13130.06270.87570.23970.20970.14530.15120.01171.03060.0991
ESKF-LC-Raw0.14020.12120.07040.00780.38230.11320.15850.14570.06240.00350.55640.1595
OURS0.09720.08640.04460.00560.32070.07790.09440.08630.03820.00410.35880.0871
Note: All values are in meters (m). RMSE = Root Mean Square Error, Std = Standard Deviation. Bold values indicate the best performance in each metric.
Table 2. Ablation Study: Localization Error Statistics Comparison on Different Trajectories.
Table 2. Ablation Study: Localization Error Statistics Comparison on Different Trajectories.
MethodsSquare TrajectoryCircular Trajectory
RMSEMeanStdMinMaxMedianRMSEMeanStdMinMaxMedian
TC Both-RAW0.10130.09100.04450.00860.25600.08490.09550.08690.03970.00950.20190.0880
TC TFmini-OJ0.10820.09950.04250.00390.27130.09680.09900.09070.03960.00610.23040.0878
TC VIO-OJ0.12690.12010.04110.02460.29730.11480.10630.10000.03620.00990.30810.1006
TC UWB-OJ0.38840.30630.23880.01910.92260.18780.14600.12520.07520.00780.43970.1017
OURS0.09720.08640.04460.00560.32070.07790.09440.08630.03820.00410.35880.0871
Note: All values are in meters (m). RMSE = Root Mean Square Error, Std = Standard Deviation. Bold values indicate the minimum in each column.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, J.; Deng, Z.; Hu, E.; Su, W.; Lou, B.; Liu, Y. An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection. Sensors 2025, 25, 7673. https://doi.org/10.3390/s25247673

AMA Style

Zhao J, Deng Z, Hu E, Su W, Lou B, Liu Y. An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection. Sensors. 2025; 25(24):7673. https://doi.org/10.3390/s25247673

Chicago/Turabian Style

Zhao, Jianmin, Zhongliang Deng, Enwen Hu, Wenju Su, Boyang Lou, and Yanxu Liu. 2025. "An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection" Sensors 25, no. 24: 7673. https://doi.org/10.3390/s25247673

APA Style

Zhao, J., Deng, Z., Hu, E., Su, W., Lou, B., & Liu, Y. (2025). An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection. Sensors, 25(24), 7673. https://doi.org/10.3390/s25247673

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop