Next Article in Journal
Integrating Autonomous Vehicles and Drones for Last-Mile Delivery: A Routing Problem with Two Types of Drones and Multiple Visits
Previous Article in Journal
Two-Dimensional Real-Time Direction-Finding System for UAV RF Signals Based on Uniform Circular Array and MUSIC-WAA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Integrated Navigation with a Self-Attention LSTM Hybrid Network for UAVs in GNSS-Denied Environments

1
School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
2
Beijing Special Electromechanical Research Institute, Beijing 100012, China
3
Yangtze Delta Region Academy of Beijing Institute of Technology, No. 1940, East North Road, Youchegang Town, Xiuzhou District, Jiaxing 314019, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(4), 279; https://doi.org/10.3390/drones9040279
Submission received: 28 January 2025 / Revised: 10 March 2025 / Accepted: 5 April 2025 / Published: 7 April 2025

Abstract

:
Performing long-duration navigation without the global navigation satellite system (GNSS) network is a challenging task, particularly for small unmanned aerial vehicles (UAVs) equipped with low-cost micro-electro-mechanical sensors. This study proposes a hybrid neural network that integrates self-attention mechanisms with long short-term memory (SALSTM) to enhance GNSS-denied navigation performance. The estimation task of GNSS-denied navigation is first modeled based on UAV aerodynamics and kinematics, enabling a precise definition of the inputs and outputs that SALSTM needs to map. A self-attention layer is inserted in multiple LSTM layers to capture long-range dependencies in subtle dynamic changes. The output layer is designed to generate state sequences, leveraging the recursive nature of LSTM to enforce state continuity constraints. The outputs of SALSTM are fused to enhance integrated navigation within an extended Kalman filter framework. The performance of the proposed method is evaluated using flight data obtained from field tests. The results demonstrate that SALSTM-enhanced integrated navigation achieves superior long-term stability and improves velocity and position estimation accuracy by more than 50% compared to the best existing methods.

1. Introduction

Navigation in GNSS-denied environments has become a key research focus due to potential signal blockage and interference. In classical methods, inertial navigation systems (INSs) and global navigation satellite systems (GNSSs) are two of the most well-established navigation methods [1]. INSs can carry high-frequency measurements and estimates, but their errors accumulate due to consistent bias and noise in rapid measurement. GNSS has no cumulative errors and is used to correct INS estimates in the Kalman filter framework [2]. Satellite-denied navigation is particularly challenging for small unmanned aerial vehicles (SUAVs), as they usually use low-cost micro-electro-mechanical sensors (MEMSs) as the measurement unit for INS, which usually possess greater measurement noise and errors. Numerous researchers have tried to introduce different techniques to fix these problems.
The integration of additional sensors is a widely adopted strategy to enhance navigation in GNSS-denied environments. Common methods include utilizing external observation base stations [3,4], fusing information from communication base stations [5] or ultra-wideband (UWB) systems [6,7], and adopting simultaneous localization and mapping (SLAM) to achieve independent navigation [8]. While these solutions can increase navigation accuracy if implemented with care, higher cost and complexity are inevitable byproducts of additional sensor usage [9]. In addition, these methods still have issues in base station-less, large-scale outdoor, and map-less environments [7,8].
In recent years, neural networks have been an important technical solution revolutionizing navigation techniques [10]. Some earlier studies used radial basis function (RBF) neural networks to produce expected values during GNSS outages [11,12]. Follow-up studies have shown that multi-layer perceptron [13] (MLP) can effectively increase the robustness of INS/GNSS techniques. The mentioned networks suffer from poor generalizability in high-noise situations and cannot store prior information and extract dynamical patterns between their inputs [14]. Recurrent neural networks (RNNs) are more capable of capturing and learning the sequential characteristics of dynamic models. Further research has proposed methods aided by RNNs to improve positional accuracy, with INS and RNNs integrated in a loosely coupled manner [15]. A similar study also used RNNs to provide velocity and position increments in a GNSS-denied environment, with tests conducted on a boat [16]. Other networks that use sequence data as input are recurrent fuzzy wavelet neural networks (RFWNNs). RFWNNs have an enhanced ability to deal with uncertainty, nonlinearity, and complex signals and have been utilized to characterize dynamic model, position, and velocity errors based on current and past information [17].
However, the learning gradient of networks that use RNNs to model recursive features vanishes exponentially [18], making it difficult to effectively utilize long sequence data. Considering the mentioned problems, the focus of later studies has been on using a long short-term memory (LSTM) network to aid in GNSS-denied navigation [19,20,21,22]. LSTM introduces cell states and gating mechanisms to control the flow of information, enabling the capture of time-series features of its inputs during longer intervals. Gated Recurrent Units (GRUs) share a similar mechanism with LSTM and have become a focal point of applied research in recent years [23,24]. Compared to LSTM, GRUs achieve higher computational efficiency through a simplified architecture but sacrifice certain long-term memory capabilities inherent to LSTM. In recent studies, attention mechanisms have been combined with LSTM to extract signal autocorrelation and cross-correlation [1]. In addition, stacking convolutional layers with the LSTM/GRU layers is believed to capture spatial and temporal structures buried hidden inside the INS data [14,25,26].
Some studies have also developed deep reinforcement learning frameworks based on LSTM for perception and control in unknown or dynamic environments, which is an end-to-end navigation and guidance approach but does not explicitly perform state estimation [27,28,29]. Following LSTM, the self-attention mechanism was proposed and became central to transformer architectures, primarily for parallelization and scalability [30,31]. This mechanism weighs the importance of different parts of an input sequence when performing inference and computes attention scores between all positions in the sequence, enabling the model to capture long-range dependencies.
So far, scholars have focused on using and improving neural networks with different architectures to assist in navigation and have achieved good application results. However, according to our survey, few existing methods have modeled this estimation problem in GNSS-denied environments. Most studies have attempted to directly employ neural networks as pseudo-GNSSs to provide position and velocity measurements [19,20,21,22]. However, neural networks face challenges in effectively capturing the weak correlation between inertial measurements and position. Achieving this requires not only a substantial volume of training data but also the use of highly complex network architectures [14]. Even so, mapping such subtle relationships may introduce unnecessary errors.
As shown in Figure 1, the estimation task is first modeled based on UAV aerodynamics when GNSS is denied in this study. Based on this model, the corresponding inputs and outputs of the neural network are carefully selected, thereby avoiding invalid mappings. Meanwhile, we designed a new self-attention and LSTM (SALSTM) hybrid network to improve the inference performance when the dynamic of the UAV is not well motivated. The self-attention layer is inserted in multiple LSTM layers. Thus, through the LSTM layer, both input and output sequences maintain recursive properties, which is crucial for preserving the model’s dynamic continuity constraints of UAVs. Finally, real-time SALSTM inference of attitude and velocity is fused for integrated navigation in the extended Kalman filter (EKF) framework, which can provide long-term stable attitude and velocity estimation and improve position estimation accuracy.
The organization of this paper is described as follows. Section 2 presents the aerodynamics of UAVs and establishes an optimization model for navigation estimation without GNSS. Section 3 introduces the SALSTM-enhanced navigation method for GNSS-denied environments, including the data fusion approach in Section 3.2 and the design of the SALSTM architecture in Section 3.3. In Section 4, the proposed method is evaluated using data collected from fixed-wing UAV field flight tests. Section 5 summarizes this study and indicates directions for future work.

2. Estimation Model

Interpretability/explainability is one of the essential issues of machine learning, as it relates to the extremely important issue of safety assurance (related to navigation) [18]. Artificial neural networks model the system as a black box, rendering the selection of input and output variables critically important. Improperly defined input and output configurations can introduce noise and even prevent the training process from converging. To enhance the inference accuracy and robustness of SALSTM, this study analyzes the flight dynamics and kinematics of UAVs and establishes a velocity and attitude estimation model for GNSS-denied environments. Based on the analysis, appropriate input and output sequences are selected, and an optimized neural network architecture was designed.

2.1. Aerodynamics of UAVs

For fixed-wing UAVs equipped with inertial sensors, it is impossible to directly obtain unbiased position, attitude, and velocity measurements from inertial data when GNSS is denied. On the other hand, the relative velocity ( V r ) of a UAV to the surrounding airmass determines the aerodynamic forces ( F a ), while V r depends on the ground velocity ( V ) and the Euler attitude ( ξ ). Since F a can be calculated from inertial measurements, this section first analyzes the relationship among inertial measurements ( V and ξ ). The reference frames associated with this relationship are illustrated in Figure 2. Below, superscript b denotes values in the body frame, and n denotes values in the navigation frame.
First, the relationship between inertial measurements and aerodynamic force is expressed as follows:
m f s f b = m a b C n b g n = F T b + C a b F a + C n b G n
where m is the mass of the UAV, f s f is the specific force measured by inertial sensors, a is the acceleration relative to the ground, C n b is the rotation matrix from the navigation frame to the body frame, g is gravitational acceleration, F T b is thrust, C a b is the rotation matrix from the wind frame to the body frame, G is gravity.
Secondly, the simplified aerodynamic force in Equation (1) is given in Ref. [32], where derivatives relative to rudder deflection and angular velocity are disregarded, as these states change at high frequencies. When defined in the wind frame, the aerodynamic force is denoted as follows:
F a = D Y L = Q ¯ S C D 0 + C D α 2 α 2 C Y β β C L 0 + C L α α
where Q ¯ is the dynamic pressure, S is the reference area, α is the angle of attack, β is the sideslip angle, C D 0 is the drag coefficient when α = 0 , C D α 2 represents the change in drag coefficient with respect to α 2 , C Y β represents the change in the coefficient of lateral force with respect to β , C L 0 is the lift coefficient when α = 0 , and C L α represents the change in the lift coefficient with respect to α .
Variables Q ¯ , β , and α in Equation (2) are defined based on the relative velocity in the body frame as follows:
V a = V r b α = tan 1 w / u β = sin 1 v / V a Q ¯ = 0.5 ρ V a 2
where V a is the airspeed; u , v , and w are components of V r b along the x, y, and z-axes, respectively, in the body frame; and ρ denotes air density.
Finally, according to the definition of a so-called “wind triangle”, V r b is related to the attitude, ground velocity, and wind velocity ( V w ) as Equation (4). Here, the attitude is represented in the form of a rotation matrix ( C n b ).
V r b = C n b V n V w n

2.2. State Dynamics

The system state is time-continuous and exhibits recursive characteristics. The states ( x * ) are the attitude denoted by Euler angles and the ground velocity in the navigation frame. The parameters ( p * ) are the aerodynamic derivatives, reference area, mass, wind velocity, and geomagnetic vector. The states are to be estimated, and parameters are thought to be either constant or slowly changing.
x * = ξ V n
p * = C L 0 , C L α , C Y β , S , m , V w n , M I n T
where M I n is the geomagnetic vector in the navigation frame.
The input data include angular rates ω and f s f as follows:
u * = ω f s f
The state transition function is expressed as follows:
f * x * , u * = ξ ˙ V ˙ n = T r ω C b n f s f + g
where C b n = C n b T and T r is constructed by the roll angle ( ϕ ) and pitch angle ( θ ) as follows:
T r = 1 sin ϕ sin θ / cos θ cos ϕ sin θ / cos θ 0 cos ϕ sin ϕ 0 sin ϕ / cos θ cos ϕ / cos θ
Then, the predicted state is expressed as follows:
x k + 1 * = x k * + T s · f x k * , u k * + ν x * , k
where subscript k denotes values at time t k , T s is the sampling interval, and ν x * , k is process noise of x * .
Consider the following parameter:
p k + 1 * = p k * + ν p * , k
where ν p * , k is process noise of p * .

2.3. Observation Model

Measurements used to correct the system state are the measured airspeed, the specific force measured by inertial sensors, and magnetic intensity. The measurement vector is expressed as follows:
z * = V p , f y , f z , M x , M y , M z T
where V p is the pitot airspeed, f y and f z denote components of f s f along the x and y-axes, respectively, in the body frame; and M x , M y , and M z represent the three-axis magnetic field intensity ( M m a g ) measured by the magnetometer.
The time-varying measurement equations are expressed as follows:
z * = h * x * , u * + η z *
where η z * is the measurement noise.
As shown in Equation (14), the first measurement equation uses the definition of airspeed to relate the measured airspeed to the relative velocity. The second and third measurement equations use the y and z-accelerometer, together with the aerodynamic model, to estimate velocity and attitude. Measurement Equations (4)–(6) utilize measurements from the magnetometer and the geomagnetic vector to estimate heading.
h * x * , u * = u , v , w v u C Y β Q ¯ S cos v u / m w u C L α + C L 0 Q ¯ S cos w u / m C n b M I + λ *
where u , v , w T = C n b V n V w n and λ * is the approximation error within the models, which is assumed to be white Gaussian noise. For simplicity, when flow angles α and β are small, the following approximations are made: α w / u , β v / u , C a b d i a g sin α , cos α , cos β .

2.4. Moving Horizon Estimation

Moving horizon estimation of the state ( x * ) can be achieved by leveraging both the state dynamics and observation models of the system. This process involves constructing the objective function (Equation (15)) based on the observed model and establishing time-continuous constraints (Equation (16)) based on the state transition function.
J x i * , p i * , u i * = min x i * p i * x k n * x ^ k n * p k n * p ^ k n * 2 + i = k n k z * i h * x i * , u i * 2
x i + 1 * x i * = 0.5 · T s · f x i + 1 * , u i + 1 * + f x i * , u i * p i + 1 * p i * ν p i * = 0
Given the definitions of x * , p * , u * , and the measurement equations, it is evident that the objective function is continuous. Additionally, the domains of x * and p * are compact sets. According to the Weierstrass theorem, the objective function is bounded, and a globally optimal solution exists.
Nevertheless, due to the complex dynamics and kinematics of UAVs, the objective function exhibits highly nonlinear characteristics, rendering high-frequency online optimization extremely challenging. However, by leveraging the easily accessible flight data from UAVs, this optimization problem can be transformed into a regression problem. Specifically, neural networks can be trained to map sequences u i * and z * i to x i * , instead of directly solving the objective function.

3. Methodology

As analyzed in Section 2, since neural networks can be used to model the relationship between measurements and attitude/velocity, we propose the integration of a neural network into an integrated navigation system to enhance navigation performance. On one hand, neural networks struggle to infer navigation states in real time at a sufficiently high frequency; on the other hand, the framework of integrated navigation allows for the fusion of more diverse information, thereby improving the robustness of the navigation system.

3.1. Integrated Navigation Framework

Sensor configurations for navigation in small UAVs typically include GNSS, MEMS-based Inertial Measurement Units (MIMUs), barometric altimeters, pitot airspeed sensors, and magnetometers. These sensors are largely applicable to most environmental conditions and are, therefore, widely used as a basic sensor suite. The integrated navigation system based on the above sensors provides position, attitude, and velocity estimation using a Kalman filter for information fusion (as shown in Figure 3).
When GNSS is available, the MIMU’s f s f and ω , pitot airspeed ( V p ), barometer altitude ( a l t b a r o ), magnetic field intensity measured by the magnetometer ( M m a g ), GNSS velocity ( V G N S S ), and P G N S S are fused for navigation. Then, the f s f , ω , V p , M m a g , and estimated Euler attitude ( ξ ^ ) and ground velocity ( V ^ ) are captured and used to train the designed neural network.
The selection of these training data is based on the estimation model described in Section 2, where f s f and ω are used to construct u * ; f b , y , f b , z   V p , and M m a g are used to construct z * ; and ξ ^ and V ^ serve as the optimal x * .
As shown in Figure 4, when GNSS is denied, there is no need to solve the objective function (Equation (15)) to provide attitude and velocity measurements. The Euler attitude ( ξ n n ) and velocity ( V n n , denoted by subscript “ n n ”) output by SALSTM are considered optimal, that is, based on the moving-horizon sequence data of f s f , ω , V p , and M m a g , the well-trained SALSTM infers ξ n n , and V n n in real time to assist with navigation.
In the Kalman filter framework, the vehicle’s attitude ( q I N S ), velocity ( V I N S ), and relative position ( P I N S ) are first estimated based on the IMU’s accelerometers and gyroscope (subscripted as “INS”) [33]. Next, V p , a l t b a r o , ξ n n , and V n n are fused as measurements to correct the estimates. As data of magnetometers have already been used to infer ξ n n and ground V n n , they are not reused for the measurement update. Conversely, pitot airspeed is fused due to considerations of wind field estimation.
This fusion estimation process is described in detail in Section 3.2. The network architecture of the designed SALSTM is detailed in Section 3.3.

3.2. Data Fusing Methodology

The system model of UAVs is vital in capturing the dynamic of UAV maneuvers in integrated navigation. Considering the nonlinearity of kinematics in the flight environment, a nonlinear system model is used, which is assumed to satisfy the Gaussian process as follows:
x k = f x k 1 , u k 1 + ν k 1 z k i = h i x k , u k , + η k i
where x k is the system state; z k i is the measurement vector; f · and h · are the nonlinear system dynamic and measurement equation, respectively; ν k and η k i are Gaussian observation white noise; and i is the number of measurement equations of the system model.
The navigation strategies are carried out in the navigation frame. The state vector includes the unit quaternion, ground velocity, position, angular rate bias ( b ω ), acceleration bias ( b f ), and two-dimensional horizontal wind speed. The unit quaternion is often used for the fast attitude solution and rigid-body transformation. The Euler attitude can be derived using a unit quaternion. The x vector is designed with a 19-dimensional state denoted by Equation (18).
x = q , V , P , b ω , b f , V w T
The input ( u ) includes the angular velocity and specific force measured by the IMU as follows:
u = ω , f s f T
Measurements ( z i ) of different sensors include the following:
(1) Altitude by barometer: z 1 = a l t b o r a .
(2) Pitot airspeed: z 2 = V p .
(3) Neural network inference of attitude and velocity: z 3 = ξ n n V n n .

3.2.1. System Dynamic

The quaternion ( q ) consists of four elements, and q = q 0 + q 1 i + q 2 j + q 3 k , q is updated as follows:
q k = q k 1 Δ q
where q k and q k 1 are unit quaternions at times t k and t k 1 , respectively. The symbol denotes the multiplication rule for quaternions, and Δ q is calculated as follows:
Δ q 1 ω / 2
ω is modeled by the following equation:
ω = ω ~ n ω b ω
where ω ~ is the measured angular rate and n ω is gyroscope noise. b ω is the gyroscope zero bias and is generally modeled as a first-order Markov process with zero-mean Gaussian noise as follows:
b ω , k = b ω , k 1 e x p T s T ω + n b ω , k 1
where T ω is the gyroscope correction time, and n b ω , k 1 denotes zero-mean Gaussian noise.
The velocity and position can be derived in the navigation frame by Equations (24) and (25), respectively.
V k = V k 1 + 1 2 C b , k 1 n f s f , k 1 T s + g T s
P k = P k 1 + 1 2 V k + V k 1 T s
where V k and P k are the velocity and position at time t k , respectively; C b , k 1 n is the rotation matrix at time t k 1 ; and f s f , k is the specific force at time t k , obtained by the following equation:
f s f , k = f ~ s f , k n f b f , k 1
where f ~ s f , k is the three-axis measured specific force and n f is accelerometer noise. b f , k 1 is the accelerometer zero bias and is assumed to be a first-order Markov process with zero-mean Gaussian noise as follows:
b f k = b f k 1 e x p T s T f + n b f , k 1
where T f is the accelerometer correction time and n b f , k 1 is zero-mean Gaussian noise.
The 3D wind speed ( V w , k ) is updated according to the Dryden wind model [34] as follows:
V w , k = V w , k 1 T s V a , k 1 V w , k 1 1 L k 1 + n V w , k
where L k 1 is the spatial wavelength, “ ” denotes the Hadamard product, and n V w , k is the process noise:
n V w , k = σ V w 2 T s V a / L k 1 ν V w
where σ V w is the noise amplitude and ν V w represents Gaussian white noise.

3.2.2. Measurement Model

The frequencies of the measurements are different, and if all measurements are processed uniformly, the solution will diverge from the sparsity of the measurement matrix [35]. Thus, this paper employs a distributed fusion correction according to the sampling frequency of measurements. In this study, the measurements ( z ~ k i ) for integrated navigation include the barometric altitude ( a l t b a r o ), pitot airspeed ( V p ), and the real-time outputs inferred by the neural network ( ξ n n and V n n ). The measurement equations for different measurements are listed as follows:
z ~ k 1 = a l t b a r o = h 1 x k , u k + η k 1
z ~ k 2 = V p = h 2 x k , u k + η k 2
z ~ k 3 = ξ n n V n n = h 3 x k , u k + η k 3
where h i · represents the measurement equation of z ~ k i .

3.2.3. Implementation for EKF

In order to reduce the computational burden on the airborne navigation module while dealing with nonlinear dynamics, the EKF framework is used for fusion processing. For the navigation system using EKF, the process consists of time updates and measurement updates.
(1) Time update
The predicted state vector ( x ^ k | k 1 ) is calculated using Equation (33), which is calculated based on the dynamic equations (Equations (20)–(29)).
x ^ k | k 1 = F k | k 1 x ^ k 1 | k 1
where F k | k 1 is the state transition matrix.
The predicted corresponding state covariance matrix ( P ^ k | k 1 ) can be obtained as follows:
P ^ k | k 1 = F k | k 1 P ^ k 1 | k 1 F k | k 1 T + Q k 1 | k 1
where Q k 1 | k 1 is the process covariance matrix.
(2) Measurement update
The sample frequencies of the mounted sensor of small fixed-wing UAVs are shown in Section 4.1. When either measurement is updated, the filter operates according to Equations (35)–(37).
The filtering gain is calculated as follows:
K k | k 1 i = P ^ k | k 1 H k | k 1 i , T / H k | k 1 i P ^ k | k 1 H k | k 1 i , T + R i
where H k | k 1 i is the Jacobian matrix derived from the sensor’s measurement equations, R i is the sensor noise covariance matrix, and i is the number of sensors.
The corrected state vector ( x ^ k | k i ) and corrected covariance matrix ( P ^ k | k i ) are estimated as follows:
x ^ k | k i = x ^ k | k 1 + K k | k 1 i ( z k i h i ( x k ) )
P ^ k | k i = ( I K k | k 1 i H k | k 1 i ) P ^ k | k 1
where I represents the identity matrix.
In order to utilize the pseudo-measurements of SALSTM and other sensor information within the EKF framework, it is necessary to derive the Jacobian matrix for measurement update. For different measurements, H k | k 1 i is derived by the following equations.
The correlation between a l t b a r o and x k is explicitly evident, with its measurement matrix expressed as follows:
H k | k 1 1 = h 1 x k , u k / x k = 0 1 × 7 0 0 1 0 1 × 9
The pitot airspeed sensor measures the UAV’s velocity relative to the surrounding air. That is, the pitot airspeed ( V p ) is related to both the ground velocity and wind field as follow:
H k | k 1 2 = h 2 x k , u k / x k = 0 1 × 4 H V , 1 × 3 V p 0 1 × 9 H V w , 1 × 3 V p
The pitot airspeed is generally insensitive to small flow angles. It is approximately equal to the magnitude of the relative velocity vector between the UAV and the surrounding air. According to the “wind triangle”, the observation equation for the pitot airspeed is expressed as follows:
V p = V n V w n
Therefore,
H V V p = V p / V n = V N V w N V n V w n V E V w E V n V w n V D V n V w n
H V w V p = V p / V w n = V N V w N V n V w n V E V w E V n V w n
where superscripts N , E , and D represent the components of the variables in the NED navigation frame.
For attitude and velocity inferred by the neural network, The Jacobian matrix derived from measurement equations is expressed as follows:
H k | k 1 3 = h 3 x k , u k / x k = H q , 1 × 4 ϕ 0 1 × 3 0 1 × 11 H q , 1 × 4 θ 0 1 × 3 0 1 × 11 H q , 1 × 4 ψ 0 1 × 3 0 1 × 11 0 3 × 4 I 3 × 3 0 3 × 11
where H q ϕ , H q θ , and H q ψ are derived in detail in Appendix A.

3.3. SALSTM Network Design

In this study, the neural network is used to map sequences z i * and u i * to optimal x i * based on the collected data, thereby avoiding the need for online optimization and solving the objective function. Referring to the estimation model in Section 2, the values of z i * and u i * are treated as the neural network’s input ( u i ), which is used to infer the system state ( x i * ) as follows:
u i = V p , i , M i x , M i y , M i z T ω i f s f , i
x i * = ξ i V i n
The role of the neural network is to establish an optimal mapping during training, then use this mapping during deployment to infer the system state, approximating the optimal value.

3.3.1. SALSTM Architecture

We consider the attitude and velocity estimation problem described in Section 2 as a time-series regression task and design the SALSTM network architecture. The SALSTM architecture leverages LSTM units for their ability to capture temporal dependencies and self-attention for its strength in modeling contextual relationships.
Given that temporal continuity of motion states is a critical constraint in dynamic models of UAVs, LSTM components are incorporated into SALSTM instead of using a pure self-attention structure (transformer). By leveraging the recurrent structure of LSTM in both the input and output layers, we effectively capture and maintain strict temporal continuity. The primary reason for not adopting a more computationally efficient GRU network is to leverage LSTM’s cell state to propagate time-continuous states such as wind and magnetic fields.
As shown in Figure 5, SALSTM consists of four layers. Its objective is to learn a function that maps the input sequence u k n , u k 1 , u k to its corresponding output sequence x k m * , x k 1 * , x k * from the training set, where n is the length of the input sequence and m is the length of the output sequence. The input vector ( u k R 10 ), consisting of f , ω , V p , and M m a g , and the output ( x k * R 6 ) include the collected ground-truth values ( ξ ^ and V ^ ).

3.3.2. LSTM Component

LSTM units are RNNs capable of selectively recognizing long sequences, proposed to solve the vanishing gradient problem presented in older variants. A LSTM unit is based on three main gates—an input gate, forget gate, and output gate—in addition to a cell state [36]. A detailed illustration of an LSTM unit is presented in Figure 6. The input gate is charged with collecting information, the forget gate decides on the information to forget and store, and the output gate updates the unit value. These gates allow for the control of information flow in the unit. The LSTM unit can be computed by calculating the different gates. At instance k , the input gate is i k , the forget gate is l k , and the output gate is o k . The presented gates can be computed as follows:
i k = σ g w i x k + u i h k 1 + b i l k = σ g w f x k + u f h k 1 + b f o k = σ g w o x k + u o h k 1 + b o
where h k is the hidden state; c k is the cell state ;   w * , u * , and b * are the network parameter matrices to be learned.
The LSTM units are applied in the first, second, and fourth layers. The first and second LSTM layers are used to extract the autocorrelation features within the input sequences. More importantly, the LSTM layers can determine the complex temporal dynamics of the UAV’s movement through the hidden state ( h k ) and cell state ( c k ). The recursive nature of the LSTM layer effectively models the dynamic properties (Equation (8)). Cells within a single LSTM layer share the same structure and parameters, which are thought to reflect the time-invariant dynamic parameters ( p * ) of the UAV. The use of a two-layer stacked LSTM structure enables better capture of the underlying features and complex dependencies.
The fourth LSTM layer serves as the output layer. Its primary function is to recover the recursive relationship within the output sequence x k m * , x k 1 * , x k * based on the weight connections within the self-attention layer. According to the state estimation model described in Section 2.4, the recursive poverty within outputs is modeled as time-continuous constraints. Therefore, we set the number of timesteps in the output layer to be m , where m is selectable between 2 and n , that is, the output of the neural network during both training and inference is a sequence of length m , although only the final output ( x k * ) is fused in the EKF framework. In contrast, if the output is specified for a single timestep, learning the continuity constraints within the outputs becomes challenging.

3.3.3. Self-Attention Component

The self-attention layer is inserted as the third layer, with the primary objective of mitigating the vanishing gradient problem and extracting information from subtle dynamic changes of the UAV. Due to the recursive nature of the LSTM layer, the final output of an LSTM cell is more reliant on recent data. By introducing the self-attention mechanism, a shortcut connection is established between the output layer and all hidden states of the second LSTM layer. The weights of these shortcut connections can be adaptively adjusted for each hidden state, effectively preventing the inference from producing anomalous values.
In the SALSTM architecture, the data input to the third layer comprise the complete set of hidden states output from all units in the second LSTM layer, which fundamentally necessitates the incorporation of the self-attention mechanism. The self-attention mechanism processes three fundamental components—query ( Q * ), key ( K * ), and value V * —ultimately producing a new sequence enriched with interactive information [37]. Given that Q * and K * share identical sequence lengths in SALSTM, the scaled dot-product attention mechanism is implemented to enhance computational efficiency, numerical stability, and representational capacity. The computational procedure of self-attention comprises four sequential operations: (1) computing the dot product between query and key matrices, (2) scaling the result by the square root of the dimension of keys ( d k ), (3) applying the softmax function to generate the weight matrix, and (4) computing the final output through the dot product between the weight matrix and value matrix. The output is mathematically expressed as follows:
i Q * , K * , V * = S o f t m a x Q * K * T d k · V *
where S o f t m a x A i = e A i / j = k n k e A j and A i represents the i-th element of the input sequence.

4. Experimental Section

This section presents an evaluation of SALSTM-enhanced integrated navigation during an artificially induced GNSS outage. The evaluation is handled using two datasets, which were collected during field flight tests.

4.1. Hardware Configuration

The datasets used in this study were collected by Talon, a fixed-wing small UAV, as illustrated in Figure 7. Talon is equipped with a high-end BMI160 six-axis inertial measurement unit, an MXT906AM GPS receiver, a magnetometer, a barometer, and a pitot tube. The detailed parameters of these sensors are listed in Table 1. The raw data from these sensors are read and stored by a Pixhawk 2.1 hardware module within the PX 4 1.8.0 software framework.
In the PX4 software framework, the data of inertial sensors are fused with data from the GPS module, achieving a final accuracy of 0.8 ° in attitude, 0.1   m / s in velocity, and 2.5   m in position [35,38]. These navigation states with GPS data fusion are also stored for SALSTM training or as reference for test.

4.2. Field Test Setup

In this study, two field flights were conducted using the Talon platform with the integrated navigation system under test. Complete sensor data were collected and stored for both flights. One of the flights provided 12 min of valid data, consisting of approximately 180,153 IMU data points, which were used to train and validate the SALSTM model. Talon performed various dynamically rich maneuvers during flight to ensure that collected data covered a broad spectrum of possible flight patterns, which is crucial for learning [1]. The data from the other flight contained 162,835 IMU data points over 11 min, which were used for an independent test of the method’s performance. These two field flight tests were conducted in Anhui Province, China, in a standard airplane flight field within legal airspace.
The steps were the same for both flight tests. Firstly, the structure and electrical components of Talon were checked. The UAV navigation states were initialized using the three-axis onboard accelerometer, gyroscope, and magnetometer, as well as the px4 autopilot initialization algorithm. Then, Talon took off remotely in “stabilized” mode and subsequently switched to “mission” mode to fly autonomously along the preset routes (shown in Figure 8). After switching to “mission” mode, flight data started to be recorded to ensure stable acquisition conditions and to guarantee the validity of the measurement data.

4.3. Model Pre-Training

After the completion of the two flights, we performed the pre-training of the SALSTM model based on data of the first flight. To evaluate the performance of the proposed algorithm in GNSS-denied environments, we first compare SALSTM with the following methods: LSTM, hierarchical LSTM (HLSTM), and Transformer. These three comparison networks share the same input and output as SALSTM.
The collected data were interpolated based on time stamps to ensure temporal continuity, then downsampled to 50 Hz for efficient utilization. Min–max normalization was applied to both the input and output data of SALSTM across all dimensions. A 6:4 split ratio was applied to divide dataset 1 into training and validation sets with sizes of 19,557 and 13,039 samples, respectively. It should be noted that after deployment, the outputs of SALSTM were rescaled using the normalization parameters to revert to their original scales.
As indicated in Table 2, the mean square error (MSE) cost function, given by Equation (48), was chosen for learning. We utilized a regularization approach known as dropout to prevent the SALSTM model from overfitting. The initial learning rate was set to 0.01, with a decay factor of 0.7 and a minimum learning rate of 10 5 . Each method’s training process was set to a maximum of 500 epochs, while early stopping was triggered if the loss function did not decrease significantly. The model parameters were empirically tuned to achieve the best feasible accuracy within the designed SALSTM architecture.
M S E = 1 o i = 1 o   j = k m k   | x j * y j | 2
where x j * and y j denote the estimated and observed outputs and o is the batch size when training is carried out.
Figure 9 depicts the accuracy achieved during the training procedure. The criterion evaluating the learning accuracy is defined as [1] A c c u r a c y = 1 tan 1 M S E . Compared with other networks, Figure 9 indicates that the SALSTM model adapts better to training data. In both training and validation data sets, SALSTM not only shows the highest learning efficiency but also achieves the highest accuracy. Compared to the Transformer model [30,31], SALSTM guarantees the time-continuous feature of the navigation state in both the input and output layers, while compared to the technique of simply stacking LSTM layers [19] and HLSTM [39], the self-attention layer inserted in the middle of SALSTM allows for the establishment of cross-time and cross-space connections of the measurement values, thereby avoiding gradient vanishing caused by measurement noise and long sequence learning.
Real-time performance is critical for navigation systems. The inference times of the four models are presented in Table 3. These times are considered acceptable for two reasons. First, the navigation software incorporates a caching mechanism to handle measurement delays and ensure temporal alignment. Second, the SALSTM inference results are fused within the EKF framework, enabling the final navigation-state output to exceed 200 Hz.
The measurement errors are assumed to be independently distributed. The covariance of ξ n n and V n n is constructed based on the MSE of the SALSTM inference results obtained during training. The covariance of measurement vector ( z 3 ) is set as follows:
R 3 = d i a g 0.04 2 , 0.02 2 , 0.09 2 , 0.25 2 , 1.26 2 , 0.47 2
where the “ d i a g a ” stands for a diagonal matrix with a on the main diagonal.

4.4. Test Results

To validate the proposed method, the model trained on dataset 1 in Section 4.3 was integrated into the integrated navigation framework described in Section 3.1. For comparison with three other methods, this study utilized the PX4 open-source offline navigation replay module (“PX4 replay”). This module employs the same navigation algorithm as the Talon airborne navigation module, except for the neural network inference fusion component. To avoid errors caused by overfitting, the sensor data used for replay were collected during the second flight (dataset 2). As for the reference true values, velocity and position were obtained from GNSS measurements, while attitude was derived from INS/GNSS fusion estimation results.
We selected HLSTM to represent the best performance of RNN-based networks and chose Transformer to represent the optimal performance of pure attention-based mechanisms. During the replay, at 326 s, GNSS signals were intentionally blocked by preconfiguring the PX4 replay’s parameters, while the pitot tube, INS, and barometer continued to operate normally. During this outage, the navigation system was aided by SALSTM, which provided ϕ n n , θ n n , ψ n n , V n n N , V n n E , V n n D as pseudo-measurements.
(1) Attitude and heading
In dataset 2, the Talon UAV executed multiple turns and pitch maneuvers. Throughout the 640 s GNSS outage, the roll angle exhibited fluctuations within the range of −30° to 30°, while the pitch angle varied between −10° and 20°. As shown in Figure 10, the three neural network-enhanced integrated navigation strategies demonstrated exceptional robustness, with no discernible cumulative errors in attitude estimation. Conversely, the EKF approach, devoid of neural network support, manifested pronounced cumulative errors in attitude estimation. Both the SALSTM and HLSTM methodologies achieved accurate heading estimation, in contrast to the Transformer method, which yielded suboptimal results. Notably, the standalone EKF was still capable of heading estimation due to the availability of the magnetometer. Table 4 presents the statistical characteristics of the estimation performance for the three neural networks and the EKF framework.
Figure 11 illustrates the detailed performance of the roll-angle estimates. Compared to HLSTM and Transformer, the proposed SALSTM network achieved superior performance, particularly in roll-angle estimation. As shown in the figure, this advantage is primarily evident when the roll angle remains relatively small—a condition under which the dynamic characteristics of the UAV are less pronounced and more challenging to capture. The recursive nature of LSTM makes it prone to gradient vanishing, whereas the self-attention mechanism effectively captures these subtle dynamic changes. On the other hand, the Transformer model, which relies solely on the self-attention mechanism as its core component, also underperformed. Due to the absence of explicit recurrent architectures, Transformers face inherent limitations in preserving the temporal continuity required for dynamic systems modeling.
(2) Velocity
During the test, the Talon’s flight speed fluctuated between 15.9 and 23.3 m/s. Figure 12 depicts the velocity estimation results of various methods. The navigation system based on SALSTM continues to perform more accurately than other methods in long-term velocity estimation, followed by the LSTM and HALSTM networks, while the transformer model exhibits the poorest performance. Table 5 presents the statistical characteristics of the velocity estimation performance for the three neural networks and the EKF.
Specifically, the diverging velocity estimates of the conventional EKF are attributed to the stand-alone INS calculations, which result in error accumulation. The Transformer model performs significantly better in estimating northward velocity than eastward velocity, a phenomenon that warrants further analysis. Nevertheless, the Transformer’s results remain unacceptable, with a root mean square error (RMSE) as high as 9 m/s. In contrast, the velocity estimations of HLSTM and SALSTM are considerably more accurate.
Compared to HLSTM, SALSTM achieves more precise estimation under conditions of slow speed variations (as shown in Figure 13). This is because in such scenarios, the dynamic characteristics of the UAV are difficult to excite. The self-attention layers integrated into the LSTM architecture effectively capture the subtle dynamic changes of the UAV.
(3) Position
Although the neural networks investigated in this study do not provide an inference value of position directly, the integrated navigation system can still achieve position estimation through velocity integration. The test data segment corresponds to a quadrilateral flight path, including multiple climbs and descents within a 1000 × 2000 × 100 m space.
As shown in Figure 14 and Figure 15, the position estimation results of the HLSTM and SALSTM methods remain stable, whereas the EKF and Transformer methods exhibit anomalous position estimates. Similarly, the SALSTM-enhanced integrated navigation system demonstrates the highest accuracy, with a horizontal position error of approximately 141 m. Considering the total flight duration of 640 s and a flight distance of 11,520 m, the achieved accuracy is remarkable, reaching 0.013. This is attributed to the measurements provided by SALSTM, which significantly reduce the NED velocity errors in the integrated navigation system, thereby substantially decreasing the cumulative position errors. Finally, due to the integration of barometer data, all methods achieve high-precision altitude estimation. Table 6 presents the statistical characteristics of the position estimation performance for the three neural networks and EKF.
Since there is no direct measurement available to correct the position in integrated navigation, the position error accumulates indefinitely over time. Therefore, the maximum accumulated position error after a continuous flight of 11 min is presented in Table 7.

5. Conclusions and Discussion

Small fixed-wing UAVs equipped with MEMS-based inertial measurement units are prone to error accumulation during navigation in GNSS-denied environments. To address this issue, this study proposes a hybrid neural network, SALSTM, which combines self-attention mechanisms with LSTM to enhance integrated navigation performance. We first model the estimation task based on UAV aerodynamics and kinematics, which enables us to precisely define the inputs and outputs for SALSTM mapping. To improve performance when the UAV’s dynamics are not well-excited, a self-attention mechanism is incorporated into SALSTM to capture long-range dependencies in subtle dynamic changes. The self-attention layers are embedded within multiple LSTM layers, and the output layer is designed to generate state sequences, leveraging the recursive nature of LSTM to enforce state continuity constraints.
The real-time inference of attitude and velocity from SALSTM is fused into the EKF framework for integrated navigation. A distributed fusion framework is adopted to integrate multi-source measurements. To facilitate deployment on low-performance onboard hardware, we derive the Jacobian matrix of Euler attitude measurements.
The SALSTM-enhanced integrated navigation method is validated using data from two flight tests. Dataset 1 is used for training and validation, while dataset 2 serves for an independent test. The results demonstrate that the SALSTM-enhanced integrated navigation system achieves superior long-term stability, delivering consistent attitude and velocity estimation over extended periods while significantly improving position estimation accuracy. Over a 640 s duration, the root mean square errors for attitude and velocity are 1.36° and 1.06 m/s, respectively, with a total cumulative position error of approximately 162 m, representing 0.013 of the total flight distances. This represents a 50% improvement over the best existing methods.
Currently, the modeling approach treats wind field parameters as unknown constants due to the lack of real-time wind field measurements in this study. If future studies incorporate measurement tools to collect real-time wind field data, the trained model could also be applied to wind field estimation. Nevertheless, when wind field and geomagnetic parameters change, SALSTM requires online training using local flight data to adjust network parameters. Based on the modeling and analysis performed in this study, the SALSTM-enhanced navigation method is applicable to various fixed-wing UAVs. However, the current approach is not suitable for rotorcraft UAVs due to their distinct flight dynamics models. Additionally, further research is needed to investigate SALSTM’s noise modeling and the accuracy of its outputs under different sensor configurations. Moreover, the method’s generalization capability in scenarios not covered by the training data requires validation, particularly in high-maneuverability and near-stall flight conditions. We recommend collecting targeted flight data for training in these specific scenarios.

Author Contributions

Conceptualization, X.S. and J.L. (Jie Li); Methodology, Z.W.; Software, Z.W.; Validation, X.W.; Formal analysis, J.L. (Juan Li); Investigation, J.L. (Jie Li) and Y.Y.; Resources, J.L. (Jie Li) and Y.Y.; Data curation, X.W.; Writing—review and editing, X.S. and J.L. (Juan Li); Supervision, X.S. and Y.Y.; Funding acquisition, J.L. (Juan Li). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62373053.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

F a Aerodynamic force
V Ground velocity
m UAV mass
f s f Specific force
a Acceleration
C * * Rotation matrix
g Gravitational acceleration
F T Thrust
G Gravity
Q ¯ Dynamic pressure
S Reference area
C * * Aerodynamic derivatives
α Angle of attack
β Sideslip angle
V a Airspeed
ρ Air density
V w Wind velocity
M I Geomagnetic vector
T s Sample time interval
V p Pitot airspeed
q Quaternion
P Position in north–east–down frame
L , D , Y Lift, drag, and lateral forces
ξ or ϕ , θ , ψ T Euler attitude
V r b or u , v , w T Relative velocity in the body frame
ω or p , q , r T Angular rates
M m a g or M x , M y , M z T Magnetic field intensity measured by the magnetometer
Superscripts:
b Values in the body frame
n Values in the navigation frame
a Values in the wind frame

Appendix A

Since
ϕ = tan 1 2 q 0 q 1 + q 2 q 3 1 2 q 1 2 + q 2 2
θ = sin 1 2 q 0 q 2 q 3 q 1
ψ = tan 1 2 ( q 0 q 3 + q 1 q 2 ) 1 2 ( q 2 2 + q 3 2 )
Therefore
H q , 1 × 4 ϕ = ϕ q = H q 0 ϕ H q 1 ϕ H q 2 ϕ H q 3 ϕ
where
H q 0 ϕ = 2 q 1 / 1 2 q 1 2 + q 2 2 1 + 4 q 0 q 1 + q 2 q 3 2 / 1 2 q 1 2 + q 2 2 2
H q 1 ϕ = 2 ( q 0 + 2 q 0 q 1 2 2 q 0 q 2 2 + 4 q 1 q 2 q 3 ) / 1 + 4 q 1 4 + 4 q 2 4 + 4 q 1 2 ( 1 + q 0 2 + 2 q 2 2 ) + 8 q 0 q 1 q 2 q 3 + 4 q 2 2 ( 1 + q 3 2 )
H q 2 ϕ = 2 ( q 3 2 q 1 2 q 3 + 2 q 2 2 q 3 + 4 q 0 q 1 q 2 ) / 1 + 4 q 1 4 + 4 q 2 4 + 4 q 1 2 ( 1 + q 0 2 + 2 q 2 2 ) + 8 q 0 q 1 q 2 q 3 + 4 q 2 2 ( 1 + q 3 2 )
H q 3 ϕ = 2 q 2 / ( 1 2 ( q 1 2 + q 2 2 ) ) ( 1 + 4 ( q 0 q 1 + q 2 q 3 ) 2 / ( 1 2 ( q 1 2 + q 2 2 ) ) 2 )
Meanwhile,
H q , 1 × 4 θ = θ q = H q 0 θ H q 1 θ H q 2 θ H q 3 θ
where
H q 0 θ = 2 q 2 / 1 4 ( q 0 q 2 q 1 q 3 ) 2
H q 1 θ = 2 q 3 / 1 4 ( q 0 q 2 q 1 q 3 ) 2
H 2 θ = 2 q 0 / 1 4 ( q 0 q 2 q 1 q 3 ) 2
H 3 θ = 2 q 1 / 1 4 ( q 0 q 2 q 1 q 3 ) 2
Meanwhile,
H q , 1 × 4 ψ = θ q = H q 0 ψ H q 1 ψ H q 2 ψ H q 3 ψ
where
H q 0 ψ = 2 q 3 / 2 q 0 q 3 + 2 q 1 q 2 2 / 2 q 2 2 + 2 q 3 2 1 2 + 1 × 2 q 2 2 + 2 q 3 2 1
H q 1 ψ = 2 q 2 / 2 q 0 q 3 + 2 q 1 q 2 2 / 2 q 2 2 + 2 q 3 2 1 2 + 1 × 2 q 2 2 + 2 q 3 2 1
H q 2 ψ = 2 q 1 / 2 q 2 2 + 2 q 3 2 1 4 q 2 2 q 0 q 3 + 2 q 1 q 2 / 2 q 2 2 + 2 q 3 2 1 2 / 2 q 0 q 3 + 2 q 1 q 2 2 / 2 q 2 2 + 2 q 3 2 1 2 + 1
H q 3 ψ = 2 q 0 / 2 q 2 2 + 2 q 3 2 1 4 q 3 2 q 0 q 3 + 2 q 1 q 2 / 2 q 2 2 + 2 q 3 2 1 2 / 2 q 0 q 3 + 2 q 1 q 2 2 / 2 q 2 2 + 2 q 3 2 1 2 + 1

References

  1. Taghizadeh, S.; Safabakhsh, R. An Integrated INS/GNSS System with an Attention-Based Hierarchical LSTM during GNSS Outage. GPS Solut. 2023, 27, 71. [Google Scholar] [CrossRef]
  2. Daneshmand, S.; Lachapelle, G. Integration of GNSS and INS with a Phased Array Antenna. GPS Solut. 2018, 22, 3. [Google Scholar] [CrossRef]
  3. Li, M.; Hu, T. Deep Learning Enabled Localization for UAV Autolanding. Chin. J. Aeronaut. 2021, 34, 585–600. [Google Scholar] [CrossRef]
  4. Pasqualetto Cassinis, L.; Fonod, R.; Gill, E.; Ahrns, I.; Gil-Fernández, J. Evaluation of Tightly- and Loosely-Coupled Approaches in CNN-Based Pose Estimation Systems for Uncooperative Spacecraft. Acta Astronaut. 2021, 182, 189–202. [Google Scholar] [CrossRef]
  5. Huang, H.; Yang, Y.; Wang, H.; Ding, Z.; Sari, H.; Adachi, F. Deep Reinforcement Learning for UAV Navigation Through Massive MIMO Technique. IEEE Trans. Veh. Technol. 2020, 69, 1117–1121. [Google Scholar] [CrossRef]
  6. Lin, H.-Y.; Zhan, J.-R. GNSS-Denied UAV Indoor Navigation with UWB Incorporated Visual Inertial Odometry. Measurement 2023, 206, 112256. [Google Scholar] [CrossRef]
  7. Liu, H.; Pan, S.; Wu, P.; Yu, K.; Gao, W.; Yu, B. Uncertainty-Aware UWB/LiDAR/INS Tightly Coupled Fusion Pose Estimation via Filtering Approach. IEEE Sens. J. 2024, 24, 11113–11126. [Google Scholar] [CrossRef]
  8. He, G.; Yuan, X.; Zhuang, Y.; Hu, H. An Integrated GNSS/LiDAR-SLAM Pose Estimation Framework for Large-Scale Map Building in Partially GNSS-Denied Environments. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
  9. Guangcai, W.; Xu, X.; Zhang, T. M-M Estimation-Based Robust Cubature Kalman Filter for INS/GPS Integrated Navigation System. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
  10. Sabatini, R.; Roy, A.; Blasch, E.; Kramer, K.A.; Fasano, G.; Majid, I.; Crespillo, O.G.; Brown, D.A.; Ogan Major, R. Avionics Systems Panel Research and Innovation Perspectives. IEEE Aerosp. Electron. Syst. Mag. 2020, 35, 58–72. [Google Scholar] [CrossRef]
  11. Sharaf, R.; Noureldin, A.; Osman, A.; El-Sheimy, N. Online INS/GPS Integration with a Radial Basis Function Neural Network. IEEE Aerosp. Electron. Syst. Mag. 2005, 20, 8–14. [Google Scholar]
  12. Chen, L.; Fang, J. A Hybrid Prediction Method for Bridging GPS Outages in High-Precision POS Application. IEEE Trans. Instrum. Meas. 2014, 63, 1656–1665. [Google Scholar] [CrossRef]
  13. Yao, Y.; Xu, X.; Zhu, C.; Chan, C.-Y. A Hybrid Fusion Algorithm for GPS/INS Integration during GPS Outages. Measurement 2017, 103, 42–51. [Google Scholar]
  14. Taghizadeh, S.; Safabakhsh, R. An Integrated INS/GNSS System With an Attention-Based Deep Network for Drones in GNSS Denied Environments. IEEE Aerosp. Electron. Syst. Mag. 2023, 38, 14–25. [Google Scholar]
  15. Malleswaran, M.; Vaidehi, V.; Sivasankari, N. A Novel Approach to the Integration of GPS and INS Using Recurrent Neural Networks with Evolutionary Optimization Techniques. Aerosp. Sci. Technol. 2014, 32, 169–179. [Google Scholar] [CrossRef]
  16. Dai, H.; Bian, H.; Wang, R.; Ma, H. An INS/GNSS Integrated Navigation in GNSS Denied Environment Using Recurrent Neural Network. Def. Technol. 2020, 16, 334–340. [Google Scholar] [CrossRef]
  17. Doostdar, P.; Keighobadi, J.; Hamed, M.A. INS/GNSS Integration Using Recurrent Fuzzy Wavelet Neural Networks. GPS Solut. 2020, 24, 29. [Google Scholar] [CrossRef]
  18. Bijjahalli, S.; Sabatini, R.; Gardi, A. Advances in Intelligent and Autonomous Navigation Systems for Small UAS. Prog. Aerosp. Sci. 2020, 115, 100617. [Google Scholar]
  19. Fang, W.; Jiang, J.; Lu, S.; Gong, Y.; Tao, Y.; Tang, Y.; Yan, P.; Luo, H.; Liu, J. A LSTM Algorithm Estimating Pseudo Measurements for Aiding INS during GNSS Signal Outages. Remote Sens. 2020, 12, 256. [Google Scholar] [CrossRef]
  20. Zhang, Z.; Li, Y.; Liu, Z.; Wang, S.; Xing, H.; Zhu, W. Enhancing the Reliability of Shipborne INS/GNSS Integrated Navigation System during Abnormal Sampling Periods Using Bi-LSTM and Robust CKF. Ocean Eng. 2023, 288, 115934. [Google Scholar] [CrossRef]
  21. Chen, S.; Xin, M.; Yang, F.; Zhang, X.; Liu, J.; Ren, G.; Kong, S. Error Compensation Method of GNSS/INS Integrated Navigation System Based on AT-LSTM during GNSS Outages. IEEE Sens. J. 2024, 24, 20188–20199. [Google Scholar]
  22. Lv, P.-F.; Lv, J.-Y.; Hong, Z.-C.; Xu, L.-X. Integration of Deep Sequence Learning-Based Virtual GPS Model and EKF for AUV Navigation. Drones 2024, 8, 441. [Google Scholar] [CrossRef]
  23. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling 2014. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  24. Tang, Y.; Jiang, J.; Liu, J.; Yan, P.; Tao, Y.; Liu, J. A GRU and AKF-Based Hybrid Algorithm for Improving INS/GNSS Navigation Accuracy during GNSS Outage. Remote Sensing 2022, 14, 752. [Google Scholar]
  25. Zhao, S.; Zhou, Y.; Huang, T. A Novel Method for AI-Assisted INS/GNSS Navigation System Based on CNN-GRU and CKF during GNSS Outage. Remote Sensing 2022, 14, 4494. [Google Scholar]
  26. Meng, X.; Tan, H.; Yan, P.; Zheng, Q.; Chen, G.; Jiang, J. A GNSS/INS Integrated Navigation Compensation Method Based on CNN–GRU + IRAKF Hybrid Model During GNSS Outages. IEEE Trans. Instrum. Meas. 2024, 73, 1–15. [Google Scholar]
  27. Guo, T.; Jiang, N.; Li, B.; Zhu, X.; Wang, Y.; Du, W. UAV Navigation in High Dynamic Environments: A Deep Reinforcement Learning Approach. Chin. J. Aeronaut. 2021, 34, 479–489. [Google Scholar]
  28. Wu, Z.; Yao, Z.; Lu, M. Deep-Reinforcement-Learning-Based Autonomous Establishment of Local Positioning Systems in Unknown Indoor Environments. IEEE Internet Things J. 2022, 9, 13626–13637. [Google Scholar]
  29. Xue, Y.; Chen, W. Multi-Agent Deep Reinforcement Learning for UAVs Navigation in Unknown Complex Environment. IEEE Trans. Intell. Veh. 2023, 9, 2290–2303. [Google Scholar]
  30. Cheng, J.; Dong, L.; Lapata, M. Long Short-Term Memory-Networks for Machine Reading. arXiv 2016, arXiv:1601.06733. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  32. Wang, Z.; Li, J.; Liu, C.; Yang, Y.; Li, J.; Wu, X.; Yang, Y.; Ye, B. Optimization-Assisted Filter for Flow Angle Estimation of SUAV Without Adequate Measurement. Drones 2024, 8, 758. [Google Scholar] [CrossRef]
  33. Hu, G.; Wang, W.; Zhong, Y.; Gao, B.; Gu, C. A New Direct Filtering Approach to INS/GNSS Integration. Aerosp. Sci. Technol. 2018, 77, 755–764. [Google Scholar]
  34. Cho, A.; Kim, J.; Lee, S.; Kee, C. Wind Estimation and Airspeed Calibration Using a UAV with a Single-Antenna GPS Receiver and Pitot Tube. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 109–117. [Google Scholar]
  35. Yang, Y.; Liu, X.; Liu, X.; Guo, Y.; Zhang, W. Model-Free Integrated Navigation of Small Fixed-Wing UAVs Full State Estimation in Wind Disturbance. IEEE Sens. J. 2022, 22, 2771–2781. [Google Scholar]
  36. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
  37. Jiang, W.; Cai, T.; Xu, G.; Wang, Y. Autonomous Obstacle Avoidance and Target Tracking of UAV: Transformer for Observation Sequence in Reinforcement Learning. Knowl.-Based Syst. 2024, 290, 111604. [Google Scholar]
  38. Zhuang, Y.; Sun, X.; Li, Y.; Huai, J.; Hua, L.; Yang, X.; Cao, X.; Zhang, P.; Cao, Y.; Qi, L.; et al. Multi-Sensor Integrated Navigation/Positioning Systems Using Data Fusion: From Analytics-Based to Learning-Based Approaches. Inf. Fusion 2023, 95, 62–90. [Google Scholar]
  39. Gandhi, A.; Sharma, A.; Biswas, A.; Deshmukh, O. GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14; Hua, G., Jégou, H., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9914, pp. 883–899. ISBN 978-3-319-48880-6. [Google Scholar]
Figure 1. Research workflow for GNSS-denied navigation.
Figure 1. Research workflow for GNSS-denied navigation.
Drones 09 00279 g001
Figure 2. The involved coordinate system: wind frame, body frame, and navigation frame. The body frame is denoted as “ x b y b z b ” (front–right–down), and the navigation frame is denoted as “NED” (north–east–down). D , Y , and L are the drag, lateral, and lift forces.
Figure 2. The involved coordinate system: wind frame, body frame, and navigation frame. The body frame is denoted as “ x b y b z b ” (front–right–down), and the navigation frame is denoted as “NED” (north–east–down). D , Y , and L are the drag, lateral, and lift forces.
Drones 09 00279 g002
Figure 3. Integrated navigation method (no background) and neural network training framework (gray background). The outputs of the integrated navigation system include position ( P ^ ), velocity ( V ^ ), and quaternion ( q ^ ). The ξ ^ values used for training are converted from q ^ .
Figure 3. Integrated navigation method (no background) and neural network training framework (gray background). The outputs of the integrated navigation system include position ( P ^ ), velocity ( V ^ ), and quaternion ( q ^ ). The ξ ^ values used for training are converted from q ^ .
Drones 09 00279 g003
Figure 4. SALSTM-enhanced integrated navigation framework in a GNSS-denied environment.
Figure 4. SALSTM-enhanced integrated navigation framework in a GNSS-denied environment.
Drones 09 00279 g004
Figure 5. SALSTM architecture, which consists of four layers with a self-attention layer inserted in three LSTM layers. In this study, the parameter n is set to 72, and m is set to 10.
Figure 5. SALSTM architecture, which consists of four layers with a self-attention layer inserted in three LSTM layers. In this study, the parameter n is set to 72, and m is set to 10.
Drones 09 00279 g005
Figure 6. LSTM component.
Figure 6. LSTM component.
Drones 09 00279 g006
Figure 7. Fixed-wing UAV Talon used for field tests. (Mass of Talon: 5.6 kg; cruise speed: 18 m/s).
Figure 7. Fixed-wing UAV Talon used for field tests. (Mass of Talon: 5.6 kg; cruise speed: 18 m/s).
Drones 09 00279 g007
Figure 8. The flight path for data collection. Dataset 1 was used for training (left), and dataset 2 was used for an independent test (right). The red-marked segment represents the valid data used for testing.
Figure 8. The flight path for data collection. Dataset 1 was used for training (left), and dataset 2 was used for an independent test (right). The red-marked segment represents the valid data used for testing.
Drones 09 00279 g008
Figure 9. Accuracy performance of SALSTM. Training set (left) and validation set (right).
Figure 9. Accuracy performance of SALSTM. Training set (left) and validation set (right).
Drones 09 00279 g009
Figure 10. The Euler attitude error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Figure 10. The Euler attitude error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Drones 09 00279 g010
Figure 11. Details of the estimated roll attitude results for different schemes.
Figure 11. Details of the estimated roll attitude results for different schemes.
Drones 09 00279 g011
Figure 12. The velocity error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Figure 12. The velocity error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Drones 09 00279 g012
Figure 13. NED velocity estimation performance. Ref (gray), Transformer (green), HLSTM (blue), and SALSTM (red).
Figure 13. NED velocity estimation performance. Ref (gray), Transformer (green), HLSTM (blue), and SALSTM (red).
Drones 09 00279 g013
Figure 14. The position error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Figure 14. The position error of the estimates. EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Drones 09 00279 g014
Figure 15. Position estimation performance. Ref (gray), EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Figure 15. Position estimation performance. Ref (gray), EKF (yellow), Transformer (green), HLSTM (blue), and SALSTM (red).
Drones 09 00279 g015
Table 1. Sensor performance parameters.
Table 1. Sensor performance parameters.
SensorTypeNoise/ErrorFrequency
BMI160Gyro
Accelerometer
0.014°/S/√Hz250 Hz
150 µg/√Hz
MXT906AMGPS receiverpos: 2.5 m
vel: 0.1 m/s
5 Hz
LIS3MDLMagnetometer3.2 mG/
MS5611Barometer0.027 mbar1024 Hz
MS5525Pressure sensor & Pitot tube0.84 Pa100 Hz
Table 2. Parameters of various models.
Table 2. Parameters of various models.
ModelCost
Function
Batch
Size
DropoutLearning
Rate
Activation
Function
Optimizer
TransformerMSE3000.20.01GELU, SoftmaxAdam
LSTM, HLSTMMSE3000.20.01tanh, sigmoidAdam
SALSTMMSE3000.20.01Softmax, tanh, sigmoidAdam
Table 3. Average inference times of various models (based on 10 trials).
Table 3. Average inference times of various models (based on 10 trials).
ModelTransformerLSTMHLSTMSALSTM
Total params187,229177,600190,290166,243
Inference time (s)0.0420.0440.0470.044
Table 4. RMSE of attitude estimates of various methods during GNSS denial.
Table 4. RMSE of attitude estimates of various methods during GNSS denial.
StateEKFTransformerHLSTMSALSTM
Roll ° 2.163.521.721.36
Pitch ° 2.791.521.170.98
Yaw ° 9.5532.5011.979.39
Table 5. RMSE of ground velocity estimates of various methods during GNSS denial.
Table 5. RMSE of ground velocity estimates of various methods during GNSS denial.
StateEKFTransformerHLSTMSALSTM
V N   m / s 20.053.042.141.02
V E   m / s 30.169.001.331.06
V D   m / s 0.170.530.280.27
Table 6. RMSE of position estimates of various methods during GNSS denial.
Table 6. RMSE of position estimates of various methods during GNSS denial.
StateEKFTransformerHLSTMSALSTM
P N   m 461.37574.4484.7526.84
P E   m 1477.982164.73279.39138.51
P D   m 6.836.547.107.16
Table 7. Maximum position estimation error of various methods during GNSS denial.
Table 7. Maximum position estimation error of various methods during GNSS denial.
StateEKFTransformerHLSTMSALSTM
P N   m 991.58897.83139.6035.27
P E   m 2525.423614.90357.24158.67
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Shen, X.; Li, J.; Li, J.; Wu, X.; Yang, Y. Enhancing Integrated Navigation with a Self-Attention LSTM Hybrid Network for UAVs in GNSS-Denied Environments. Drones 2025, 9, 279. https://doi.org/10.3390/drones9040279

AMA Style

Wang Z, Shen X, Li J, Li J, Wu X, Yang Y. Enhancing Integrated Navigation with a Self-Attention LSTM Hybrid Network for UAVs in GNSS-Denied Environments. Drones. 2025; 9(4):279. https://doi.org/10.3390/drones9040279

Chicago/Turabian Style

Wang, Ziyi, Xiaojun Shen, Jie Li, Juan Li, Xueyong Wu, and Yu Yang. 2025. "Enhancing Integrated Navigation with a Self-Attention LSTM Hybrid Network for UAVs in GNSS-Denied Environments" Drones 9, no. 4: 279. https://doi.org/10.3390/drones9040279

APA Style

Wang, Z., Shen, X., Li, J., Li, J., Wu, X., & Yang, Y. (2025). Enhancing Integrated Navigation with a Self-Attention LSTM Hybrid Network for UAVs in GNSS-Denied Environments. Drones, 9(4), 279. https://doi.org/10.3390/drones9040279

Article Metrics

Back to TopTop