1. Introduction and Outline
The main objective of this article is to develop a navigation system capable of diminishing the position drift inherent to the flight in GNSS (Global Navigation Satellite System)-denied conditions of an autonomous fixed-wing aircraft, so in case GNSS signals become unavailable during flight, the vehicle has a higher probability of reaching the vicinity of a distant recovery point, from where it can be landed by remote control. The proposed approach combines two different navigation systems (inertial, which is based on the outputs of all onboard sensors except the camera, and visual, which relies exclusively on the images of the Earth surface taken from the aircraft) in such a way that both simultaneously assist and control each other, resulting in a positive feedback loop with major improvements in horizontal position estimation accuracy compared with either system by itself.
The fusion between the inertial and visual systems is a two-step process. The first one, described in [
1], feeds the visual system with the inertial attitude and altitude estimations to assist with its nonlinear pose optimizations, resulting in major horizontal position accuracy gains with respect to both a purely visual system (without inertial assistance) and a standalone inertial one such as that described in [
2]. The second step, which is the focus of this article, feeds back the visual horizontal position estimations into the inertial navigation filter as if they were the outputs of a virtual vision sensor, or VVS, replacing those of the GNSS receiver, and results in additional horizontal position accuracy improvements.
Section 2 describes the mathematical notation employed throughout the article. An introduction to GNSS-denied navigation and its challenges, together with a review of the state of the art in visual inertial navigation, is included in
Section 3.
Section 4 and
Section 5 discuss the article novelty and main applications, respectively. The proposed architecture to combine the inertial and visual navigation algorithms is discussed in
Section 6, which relies on the VVS described in
Section 7. The VVS outputs are employed by the navigation filter, which is described in detail in
Section 8.
Section 9 introduces the stochastic high-fidelity simulation employed to evaluate the navigation results by means of Monte Carlo executions of two scenarios representative of the challenges of GNSS-denied navigation. The results obtained when applying the proposed algorithms to these scenarios are described in
Section 10, comparing them with those achieved by standalone inertial and visual systems. Last,
Section 11 summarizes the results, while
Appendix A describes the obtainment of various Jacobians employed in the proposed navigation filter.
2. Mathematical Notation
The meaning of all variables is specified on their first appearance. Any variable with a hat accent refers to its estimated value, with a tilde to its measured value, and with a dot to its time derivative. In the case of vectors, which are displayed in bold (e.g., ), other employed symbols include the skew-symmetric form and the double vertical bars , which refer to the norm. The superindex T denotes the transpose of a vector or matrix. In the case of scalars, the vertical bars refer to the absolute value. The left arrow represents an update operation, in which the value on the right of the arrow is assigned to the variable on the left.
Four different reference frames are employed in this article: the ECEF (Earth centered Earth fixed) frame
(centered at the Earth center of mass
, with
pointing towards the geodetic north along the Earth rotation axis,
contained in both the equator and zero longitude planes, and
orthogonal to
and
forming a right-handed system), the NED (north east down) frame
(centered at the aircraft center of mass
, with axes aligned with the geodetic north, east, and down directions), the body frame
(centered at the aircraft center of mass
, with
contained in the plane of symmetry of the aircraft pointing forward along a fixed direction,
contained in the plane of symmetry of the aircraft, normal to
and pointing downward, and
orthogonal to both in such a way that they form a right-hand system), and the inertial frame
(centered at the Earth center of mass with axes that do not rotate with respect to any stars other than the Sun). The first three frames are graphically depicted in
Figure 1.
This article makes use of the space of rigid body rotations and, hence, relies on the Lie algebra of the special orthogonal group of
, known as
; refs. [
3,
4,
5] are recommended as references. Generic rotations are represented by
and usually parameterized by the unit quaternion
, although the direct cosine matrix
is also employed as required; tangent space representations include the rotation vector
and the angular velocity
. Related symbols include the quaternion conjugate
, the quaternion product
, and the
plus
and minus
operators.
Superindexes are employed over vectors to specify the frame in which they are viewed (e.g., refers to ground velocity viewed in , while is the same vector but viewed in ). Subindexes may be employed to clarify the meaning of a variable or vector, but may also indicate a given vector component; e.g., refers to the second component of . In addition, where two reference frames appear as subindexes to a vector, it means that the vector goes from the first frame to the second. For example, refers to the angular velocity from to viewed in . Jacobians are represented by a combined with a subindex and a superindex; the former identifies the function domain, while the latter applies to the function image or codomain.
3. GNSS-Denied Navigation and Visual Inertial Odometry
The number, variety, and applications of unmanned air vehicles (UAVs) have grown exponentially in the last few years, and the trend is expected to continue in the future [
6,
7]. A comprehensive review of small UAV navigation systems and the problems they face is included in [
8]. The use of GNSS constitutes one of the main enablers for autonomous inertial navigation [
9,
10,
11], but it is also one of its main weaknesses, as in the absence of GNSS signals, inertial systems are forced to rely on dead reckoning, which results in position drift, with the aircraft slowly but steadily deviating from its intended route [
2,
12]. The availability of GNSS signals cannot be guaranteed; a thorough analysis of GNSS threats and reasons for signal degradation is presented in [
13].
At this time, there are no comprehensive solutions to the operation of autonomous UAVs in GNSS-denied scenarios, for which the permanent loss of the GNSS signals is equivalent to losing the airframe in an uncontrolled way. A summary of the challenges of GNSS-denied navigation and the research efforts intended to improve its performance is provided by [
2,
14,
15]. Two promising techniques for completely eliminating the position drift are the use of
signals of opportunity [
16,
17,
18] (existing signals originally intended for other purposes, such as those of television and cellular networks, can be employed to triangulate the aircraft position) and
georegistration [
19,
20,
21,
22,
23,
24,
25,
26,
27] (the position drift can be eliminated by matching landmarks or terrain features as viewed from the aircraft to preloaded data), also known as
image registration.
Visual odometry (VO) consists in employing the ground images generated by one or more onboard cameras without the use of prerecorded image databases or any other sensors, incrementally estimating the vehicle pose (position plus attitude) based on the changes that its motion induces on the images [
28,
29,
30,
31,
32]. It requires sufficient illumination, dominance of static scene, enough texture, and scene overlap between consecutive images or frames. Modern standalone algorithms, such as semidirect visual odometry (SVO) [
33,
34], direct sparse odometry (DSO) [
35], large-scale direct simultaneous localization and mapping (LSD-SLAM) [
36], and large-scale feature-based SLAM (ORB-SLAM) [
37,
38,
39] (ORB stands for oriented FAST and rotated BRIEF, a type of blob feature), are robust and exhibit a limited drift. The incremental concatenation of relative poses results in a slow but unbounded pose drift that disqualifies VO for long-distance autonomous UAV GNSS-denied navigation.
A promising research path to GNSS-denied navigation appears to be the fusion of inertial and visual odometry algorithms into an integrated navigation system. Two different trends coexist in the literature, although an implementation of either group can sometimes employ a technique from the other:
The first possibility, known as
filtering, consists in employing the visual estimations as additional observations with which to feed the inertial filter. In the case of attitude estimation exclusively (momentarily neglecting the platform position), the GNSS signals are helpful and enable the inertial navigation system to obtain more accurate and less noisy estimations, but they are not indispensable, as driftless attitude estimations can be obtained based on the inertial measurement unit (IMU) without their use [
2]. Proven techniques for attitude estimation in all kinds of platforms that do not rely on GNSS signals include Gaussian filters [
40], deterministic filters [
41], complimentary filters [
42], and stochastic filters [
43,
44]. In the case of the complete pose estimation (both attitude and position), it is indispensable to employ the velocity or incremental position observations obtained with VO methods in the absence of the absolute references provided by GNSS receivers. The literature includes cases based on Gaussian filters [
45] and, more recently, nonlinear deterministic filters [
46], stochastic filters [
47], Riccati observers [
48], and invariant extended Kalman filters, or EKFs [
49].
The alternative is to employ VO optimizations with the assistance of the inertial navigation estimations, reducing the pose estimation drift inherent to VO. This is known as visual inertial odometry (VIO) [
50,
51], which can also be combined with image registration to fully eliminate the remaining pose drift. A previous article by the authors, [
1], describes how the nonlinear VO optimizations can be enhanced by adding priors based on the inertial attitude and altitude estimations. VIO has matured significantly in the last few years, with detailed reviews available in [
50,
51,
52,
53,
54].
There exist several open-source VIO packages, such as the multistate constraint Kalman filter (MSCKF) [
55], the open keyframe visual inertial SLAM (OKVIS) [
56,
57], the robust visual inertial odometry (ROVIO) [
58], the monocular visual inertial navigation system (VINS-Mono) [
59], SVO combined with multisensor fusion (MSF) [
33,
34,
60,
61], and SVO combined with incremental smoothing and mapping (iSAM) [
33,
34,
62,
63]. All these open-source pipelines are compared in [
50], and their results, when applied to the EuRoC micro air vehicles (MAV) data sets, ref. [
64] are discussed in [
65]. There also exist various other published VIO pipelines with implementations that are not publicly available [
66,
67,
68,
69,
70,
71,
72], and there are also others that remain fully proprietary.
The existing VIO schemes to fuse the visual and inertial measurements can be broadly grouped into two paradigms:
loosely coupled pipelines process the measurements separately, resulting in independent visual and inertial pose estimations, which are then fused to obtain the final estimate [
73,
74]; on the other hand,
tightly coupled methods compute the final pose estimation directly from the tracked image features and the IMU outputs [
50,
51]. Tightly coupled approaches usually result in higher accuracy, as they use all the information available and take advantage of the IMU integration to predict the feature locations in the next frame. Loosely coupled methods, although less complex and more computationally efficient, lose information by decoupling the visual and inertial constraints, and are incapable of correcting the drift present in the visual estimator.
A different classification involves the number of images involved in each estimation [
50,
51,
75], which is directly related with the resulting accuracy and computing demands.
Batch algorithms, also known as
smoothers, estimate multiple states simultaneously by solving a large nonlinear optimization problem or bundle adjustment, resulting in the highest possible accuracy. Valid techniques to limit the required computing resources include the reliance on a subset of the available frames (known as
keyframes), the separation of tracking and mapping into different threads, and the development of incremental smoothing techniques based on factor graphs [
63]. Although employing all available states (
full smoothing) is sometimes feasible for very short trajectories, most pipelines rely on
sliding window or
fixed lag smoothing, in which the optimization relies exclusively on the measurements associated with the last few keyframes, discarding both the old keyframes and other frames that have not been cataloged as keyframes. On the other hand,
filtering algorithms restrict the estimation process to the latest state; they require less resources but suffer from permanently dropping all previous information and a much harder identification and removal of outliers, both of which lead to error accumulation or drift.
Additional VIO pipelines are also being developed to take advantage of
event cameras [
76,
77], which measure per-pixel luminosity changes asynchronously instead of capturing images at a fixed rate. Event cameras hold very advantageous properties for visual odometry, including low latency, very high frequency, and excellent dynamic range, which make them well suited to deal with high-speed motion and high dynamic range, which are problematic for traditional VO and VIO pipelines. On the negative side, new algorithms are required to cope with the sequence of asynchronous events they generate [
50].
6. Proposed Visual Inertial Navigation Architecture
The navigation system processes the measurements of the various onboard sensors (
) and generates the estimated state (
) required by the guidance and control systems, which contains an estimation of the true aircraft state
. Note that in this article, the images
are considered part of the sensed states
as the camera is treated as an additional sensor. Hardware and computing power limitations often require the camera and visual algorithms to operate at a lower frequency than those of inertial algorithms and the remaining sensors.
Table 1 includes the operating frequencies employed to generate the
Section 10 results. Note that the index
t identifies a discrete time instant (
) for the aircraft trajectory,
s (
) refers to the sensor outputs,
n (
) identifies an inertial estimation,
c (
) is a control system execution,
i (
) applies to a camera image and the corresponding visual estimation, and
g (
) refers to a GNSS receiver observation.
The proposed strategy to fuse the inertial and visual navigation systems is based on their performances as standalone systems [
1,
2], as summarized in
Table 2. Inertial systems are clearly superior to visual ones in GNSS-denied attitude and altitude estimation since their estimations are bounded and do not drift. In the case of horizontal position, both systems drift and are hence inappropriate for long-term GNSS-denied navigation, but for different reasons; while the inertial drift is the result of integrating the bounded ground velocity estimations without absolute position observations, the visual one originates on the slow but continuous accumulation of estimation errors between consecutive images.
It is also worth noting that inertial estimations are significantly noisier than visual ones, so the latter, although worse on an absolute basis, are often more accurate when evaluating the estimations incrementally, this is, from one image to the next or even for the time interval corresponding to a few images. When applied to the horizontal position, the high accuracy of the incremental image-to-image displacement observations (equivalent to ground velocity observations) constitutes the basis for the second phase of the fusion described in the following sections.
This article proposes an inertial navigation filter (
Section 8) that, in the absence of GNSS signals, takes advantage of the incremental displacement outputs generated by a visual system as if they were the observations generated by an additional sensor (
Section 7), denoted as the virtual vision sensor, or VVS. The proposed EKF relies on the observations provided by the onboard accelerometers, gyroscopes, and magnetometers, combined with those of the GNSS receiver when signals are available, or those of the VVS when they are not. The inertial navigation filter and the visual system continuously assist and control each other, as depicted in
Figure 2. As the integrated system needs to estimate the state
at every
, it has three different operation modes, depending not only on whether GNSS signals are available or not, but also on whether there exist a position and ground velocity observation at the execution time
:
Given the operating frequencies of the different sensors listed in
Table 1, most navigation filter cycles (99 out of 100 for GNSS-based navigation, 9 out of 10 in case of GNSS-denied conditions) can not rely on the position and ground velocity observations provided by either the GNSS receiver or the VVS. These cycles successively apply the EKF time update, measurement update, and reset equations (
Section 8), noting that the filter observation system needs to be simplified by removing the ground velocity and position observations, which are not available.
GNSS-denied filter cycles for which the VVS position and ground velocity observations are available at the required time (
) rely on the
Figure 3 scheme. Note that this case only occurs in 1 out of 10 executions for GNSS-denied conditions. The first part of the process is exactly the same as in the previous case, estimating first the a priori state and covariance (
,
), followed by the a posteriori ones (
,
), and finally the estimated state (
). Note that as these estimations do not employ the VVS measurements (that is, do not employ the
image corresponding to
), they are preliminary and hence denoted with the subindex PRE to avoid confusion.
The preliminary estimation
together with the
image is then passed to the visual system [
1] to obtain a visual state
, which, as explained in
Section 7, is equivalent to the VVS position and ground velocity outputs. Only then it is possible to apply the complete navigation filter measurement update and reset equations for a second time to obtain the final state estimation
.
GNSS-based filter cycles for which the GNSS receiver position and ground velocity observations are available at the required time (). This case, which only occurs in one out of a hundred executions when GNSS signals are available, is in fact very similar to the first one above, with the only difference being that the navigation filter observation system does not need any simplifications.
7. Virtual Vision Sensor
The
virtual vision sensor (VVS) constitutes an alternative denomination for the incremental displacement outputs obtained by the inertial assisted visual system described in [
1]. Its horizontal position estimations drift with time due to the absence of absolute observations, but when evaluated on an incremental basis, that is, from one image to the next, the incremental displacement estimations are quite accurate. This section shows how to convert these incremental estimations into measurements for the geodetic coordinates (
) and the
ground velocity (
), so the VVS can smoothly replace the GNSS receiver within the navigation filter in the absence of GNSS signals (
Section 8).
Their obtainment relies on the current visual state
generated by the visual system, the one obtained with the previous frame
, as well as the estimated state
=
=
corresponding to the previous image generated by the inertial system. Note that as
=
=
, the relationship between
i and
n is as follows, where
is the number of inertial executions for every image being processed (
Table 1):
To obtain the VVS velocity observations (
), it is necessary to first compute the geodetic coordinate (longitude
, latitude
, altitude h) time derivative based on the difference between their values corresponding to the last two images (
2), followed by their transformation into
per (3), in which M and N represent the WGS84 radii of curvature of meridian and prime vertical, respectively. Note that
is very noisy given how it is obtained.
With respect to the VVS geodetic coordinates, the sensed longitude and latitude can be obtained per (5) and (6) based on propagating the previous inertial estimations (those corresponding to the time of the previous VVS reading) with their visually obtained time derivatives. To avoid drift, the geometric altitude
is estimated based on the barometer observations assuming that the atmospheric pressure offset remains frozen from the time the GNSS signals are lost [
2].
With respect to the covariances, the authors have assigned the VVS an ad hoc position standard deviation one order of magnitude lower than that employed for the GNSS receiver so the navigation filter (
Section 8) closely tracks the position observations provided by the VVS. Given the noisy nature of the virtual velocity observations, the authors have preferred to employ a dynamic evaluation for the velocity standard deviation, which coincides with the absolute value (for each of the three dimensions) of the difference between each new velocity observation and the running average of the last 20 readings (equivalent to the last
). In this way, when GNSS signals are available, the covariance of the position observations is higher, so the navigation filter slowly corrects the position estimations obtained from integrating the state equations instead of closely adhering to the noisy GNSS position measurements; when GNSS signals are not present, the VVS low position observation covariance combined with the high-velocity one implies that the navigation EKF (
Section 8) closely tracks the VVS position readings (
4), continuously adjusting the aircraft velocity estimations with little influence from the VVS velocity observations in order to complement its position estimations.
8. Proposed Navigation Filter
As graphically depicted in
Figure 2, the navigation filter estimates the aircraft pose
based on the observations provided by the onboard sensors with the exception of the camera,
, complemented if necessary (in the absence of GNSS signals) by the visual observations
provided by the VVS. Implemented as an EKF, the objective of the navigation filter is the estimation of the aircraft attitude between the
and
frames represented by its unit quaternion
[
3] together with the aircraft position represented by its geodetic coordinates
. Additional state vector members include the aircraft angular velocity
; the ground velocity
; the specific force
or nongravitational acceleration from
to
; the full gyroscope, accelerometer, and magnetometer errors (
) that include all sensor error sources except system noise; and the difference
between the Earth magnetic field provided by the onboard model
and the real one
.
Rigid body rotations are characterized by their orthogonality (they preserve the norm of vectors and the angle between vectors) and their handedness (they preserve the orientation or cross product between vectors); when encoded as quaternions, these two constraints converge into a single one, the unitary norm, as only unit quaternions are valid parameterizations of rigid body rotations [
3,
4,
5]. As the body attitude
belongs to the non-Euclidean
special orthogonal group or
, the classical EKF scheme [
78], which relies on linear algebra, shall be replaced by the alternative formulation proposed here to ensure that the estimated attitude never deviates from its
manifold, that is, to ensure that
remains unitary at all times [
79]. Relying on linear algebra implies treating the unit quaternion
as a 4-vector in both the time update and measurement update equations of each EKF cycle, and then normalizing the resulting nonunit quaternion at the end of each cycle; although this is an established practice that provides valid results when operating at the high frequencies typical of inertial sensors, the slower frequencies characteristic of visual pipelines (refer to
and
in
Table 1) when combined with the lack of absolute position observations in the absence of GNSS signals result in slight but non-negligible inaccuracies in each cycle, which can accumulate into significant errors when the aircraft needs to navigate without absolute position observations for long periods of time.
The use of Lie theory, with its manifolds and tangent spaces, enables the construction of rigorous calculus techniques to handle uncertainties, derivatives, and integrals of non-Euclidean elements with precision and ease [
4]. Lie theory can hence be employed to generate an alternative EKF scheme suitable for state vectors containing non-Euclidean components [
5]. This alternative EKF scheme differs in four key points from the classical Euclidean one; these four differences are discussed in the following paragraphs. Note that the GNSS-denied navigation results discussed in
Section 10 cannot be obtained without the use of Lie algebra and the alternative EKF formulation.
The most rigorous and precise way to adapt the EKF scheme to the presence of non-Euclidean elements, such as rigid body rotations, is to exclude the rotation element
from the state vector, replacing the unit quaternion
by a local (
) tangent space perturbation represented by the rotation vector
[
4,
5]. Note that the state vector (9) includes the velocity of
as it moves along its manifold (its angular velocity
), also contained in the local (
) tangent space. Each filter step now consists in estimating the rotation element
, the state vector
, and its covariance
, based on their values at
and the observations at
.
The observations
are provided by the gyroscopes, which measure the inertial angular velocity
, the accelerometers that provide the specific force
or nongravitational acceleration, the magnetometers that measure the magnetic field
, and either the GNSS receiver or the VVS (
Section 7) that provides the position
(geodetic coordinates) and ground velocity
observations. Note that the measurements are provided at different frequencies, as listed in
Table 1; most are available every
, but the ground velocity and position measurements are generated every
if provided by the GNSS receiver or every
when the GNSS signals are not available and the readings are instead supplied by the VVS.
8.1. Time Update Equations
The variation with time of the state variables (
11) comprises a continuous time nonlinear state system composed of three exact equations (with no simplifications), (12), (14), and (
15), together with six other differential Equations (13) and (16) in which the filter does not posses any knowledge about the time derivatives of the state variables, which are hence set to zero. The three exact equations are the following:
Note that in expression (
15),
represents the
motion angular velocity experienced by any object that moves without modifying its attitude with respect to the Earth surface,
constitutes the gravity acceleration modeled by the navigation system, and
represents the Coriolis acceleration also viewed in
. In expression (
11),
is the process noise that, although present, is not shown in Equations (12) through (16).
It is important to remark that this alternative EKF implementation, which removes the rotation element
from the state vector and instead relies on the local tangent space perturbation
, avoids the use of (
17), whose integration in the classical EKF scheme considers the unit quaternion as a Euclidean 4-vector without respecting its
unitary constraint, requiring a normalization after each filter step. Avoiding these normalizations and their associated errors constitutes the first significant advantage of employing the alternative EKF scheme presented in this article.
The linearization of the (
11) system results in (
18), which makes use of the system matrix
, where the obtainment of the various Jacobians
is described in
Appendix A. Those components of
not individually listed in (19) through (23) are zero. Note that the linearization neglects the influence of the geodetic coordinates
on four different terms (
,
,
, and
) present in the (
11) differential equations, but includes the influence of the aircraft velocity
on three of them (gravity does not depend on velocity). The reason is that the
variation amount that can be experienced in a single state estimation step can have a significant influence on the values of the variables considered, while that of the geodetic coordinates
does not and hence can be neglected.
The use of Lie Jacobians within (21) and (23), as well as in (31), (33), and (37), (39) below, constitutes the second difference between the proposed EKF scheme and the Euclidean one (refer to
Appendix A for details on how these Jacobians are obtained). Although not intuitive, Lie Jacobians adhere to the concept of matrix derivative (Jacobian) as the infinitesimal variations in their formulation are contained within the corresponding tangent spaces and do not deviate from them [
4,
5]. In contrast, employing the classical EKF scheme with
as part of the state vector implies establishing independent partial derivatives with respect to each of the four quaternion coefficients, which lack any physical meaning in the absence of the unitary constraint that links their values together.
8.2. Measurement Update Equations
The discrete time nonlinear observation system (
24) contains the observations provided by the different sensors, without any simplifications.
represents the Earth angular velocity (viewed in
) caused by its rotation around the
axis at a constant rate
, and
represents the measurement or observation noise. Note that although present, measurement noise
is not shown in Equations (25) through (29).
Its linearization results in the following output matrix
, in which
Appendix A describes the obtainment of the various Jacobians
present in (31), (33), (37), and (39). Those components of
not individually listed in (31) through (41) are zero. As in the state system case above, the linearization also neglects the influence of the geodetic coordinates
on three different terms (
,
, and
) present in the (
24) observation equations.
8.3. Covariances
Based on the above linearized continuous state system (
18) and discrete observations (
30), the time and measurement update equations within each EKF cycle (there are no differences here between the proposed and classical EKF schemes) [
5,
78] result in estimations for both the state vector
as well as its covariance
. Note that the definition of the covariance matrix in (
42) combines that of its Euclidean components (46) with its local Lie counterparts (43) [
5], with additional combined members. The mean or expect value is represented by
.
Uncertainty around a given
attitude is properly expressed as a covariance on the tangent space [
4]. The definition of
in (43) represents the third advantage of the proposed EKF scheme, as it implements the concept of covariance but applied to
attitudes instead of Euclidean vectors, as in (46). Its size is
, as rotations contain three degrees of freedom, and each coefficient contains the covariance when rotating about a specific axis viewed in the local (
) tangent space; in contrast, introducing the
unit quaternion in the state vector demands an
covariance matrix in which the coefficients do not have any physical meaning. Note that if required, it is possible to obtain the covariance for each of the three Euler angles (yaw, pitch, roll) by rotating
to view it in frames other than
[
5].
8.4. Reset Equations
The fourth and final difference between the suggested EKF scheme and the Euclidean one is that after each filter estimation cycle, it is necessary to reset the estimated tangent space perturbation to zero while updating the estimations for the rotation element and the error covariance accordingly; the estimations for angular velocity and the Euclidean components are not modified. Note that the accuracy of the linearizations required to obtain the and system and output matrices is based on first-order Taylor expansions, which are directly related to the size of the tangent space perturbations. Although it is not strictly necessary to reset the perturbations in every EKF cycle, the accuracy of the whole state estimation process is improved by maintaining the perturbations as small as possible, so it is recommended to never bypass the reset step.
Taking into account that the rotation element is going to be updated per (
49), the error covariance is propagated to the new rotation element as follows [
5], where the Lie Jacobian is provided by (
A7) within
Appendix A:
Last, the rotation element is propagated by means of the local rotation vector perturbation, which is itself reset to zero:
Different position (geodetic coordinates) and ground velocity system noise values are employed depending on whether the measurements are provided by the GNSS receiver or the VVS:
Coupled with the observation covariances discussed in
Section 7, lower geodetic coordinate system noise values are employed when GNSS signals are available, as the objective is for the solution to avoid position jumps by smoothly following the state equations and only slightly updating the position based on the GNSS observations to avoid any position drift in the long term.
When the position observations are instead supplied by the VVS, higher system noise values are employed so the EKF relies more on the observations and less on the integration. Note that the VVS velocity observations are very noisy because of (
2), so it is better if the EKF closely adheres to the position observations that originate at the visual system, correcting in each step as necessary.
9. Testing: High-Fidelity Simulation and Scenarios
To evaluate the performance of the proposed navigation algorithms, this article relies on Monte Carlo simulations consisting of one hundred runs each of two different scenarios based on the high-fidelity stochastic flight simulator graphically depicted in
Figure 4. The simulator, whose open source C
++ implementation is available in [
80], models the flight of a fixed-wing piston engine autonomous UAV.
The simulator consists of two distinct processes:
The first, represented by the yellow blocks on the right of
Figure 4, models the physics of flight and the interaction between the aircraft and its surroundings, which results in the real aircraft trajectory
.
The second, represented by the green blocks on the left, contains the aircraft systems in charge of ensuring that the resulting trajectory adheres as much as possible to the mission objectives. It includes the different sensors whose output comprises the sensed trajectory
, the navigation system in charge of filtering it to obtain the estimated trajectory
, the guidance system that converts the reference objectives
into the control targets
, and the control system that adjusts the position of the throttle and aerodynamic control surfaces
so the estimated trajectory
is as close as possible to the reference objectives
.
Table 1 lists the working frequencies of the various blocks represented in
Figure 4.
All components of the flight simulator have been modeled with as few simplifications as much as possible to increase the realism of the results. With the exception of the aircraft performances and its control system, which are deterministic, all other simulator components are treated as stochastic and hence vary from one execution to the next, enhancing the significance of the Monte Carlo simulation results.
Most VIO packages discussed in
Section 3 include in their release articles an evaluation when applied to the EuRoC MAV data sets [
64], and so do independent evaluations, such as [
65]. These data sets contain perfectly synchronized stereo images, IMU measurements, and ground truth readings obtained with laser, for eleven different indoor trajectories flown with a MAV, each with a duration in the order of 2 min and a total distance in the order of
. This fact by itself indicates that the target application of exiting VIO implementations differs significantly from the main focus of this article, as there may exist accumulating errors that are completely nondiscernible after such short periods of time, but that grow nonlinearly and have the capability of inducing significant pose errors when the aircraft remains aloft for long periods of time. The algorithms presented in this article are hence tested through simulation under two different scenarios designed to analyze the consequences of losing the GNSS signals for long periods of time; these two scenarios coincide with those employed in [
1,
2] to evaluate standalone visual and inertial algorithms. Most parameters comprising the scenarios are defined stochastically, resulting in different values for every execution. Note that all results shown in
Section 10 are based on Monte Carlo simulations comprising one hundred runs of each scenario, testing the sensitivity of the proposed navigation algorithms to a wide variety of values in the parameters.
Scenario #1 has been defined with the objective of adequately representing the challenges faced by an autonomous fixed-wing UAV that suddenly cannot rely on GNSS and hence changes course to reach a predefined recovery location situated at approximately 1 h of flight time. In the process, in addition to executing an altitude and airspeed adjustment, the autonomous aircraft faces significant weather and wind field changes that make its GNSS-denied navigation even more challenging.
With respect to the mission, the stochastic parameters include the initial airspeed, pressure altitude, and bearing (); their final values (); and the time at which each of the three maneuvers is initiated (turns are executed with a bank angle of . Altitude changes employ an aerodynamic path angle of . Additionally, airspeed modifications are automatically executed by the control system as set-point changes). The scenario lasts for , while the GNSS signals are lost at .
The wind field is also defined stochastically, as its two parameters (speed and bearing) are constant both at the beginning () and conclusion () of the scenario, with a linear transition in between. The specific times at which the wind change starts and concludes also vary stochastically among the different simulation runs.
A similar linear transition occurs with the temperature and pressure offsets that define the atmospheric properties, as they are constant both at the start () and end () of the flight.
The turbulence remains strong throughout the whole scenario, but its specific values also vary stochastically from one execution to the next.
Scenario #2 represents the challenges involved in continuing with the original mission upon the loss of the GNSS signals, executing a series of continuous turn maneuvers over a relatively short period of time with no atmospheric or wind variations. As in scenario , the GNSS signals are lost at , but the scenario duration is shorter (). The initial airspeed and pressure altitude () are defined stochastically and do not change throughout the whole scenario; the bearing, however, changes a total of eight times between its initial and final values, with all intermediate bearing values as well as the time for each turn varying stochastically from one execution to the next. Although the same turbulence is employed as in scenario , the wind and atmospheric parameters () remain constant throughout scenario .
11. Summary and Conclusions
This article proves that the inertial and visual navigation systems of an autonomous UAV can be combined in such a way that the resulting long-term GNSS-denied horizontal position drift is only a small fraction of what can be achieved by either system individually. The proposed system, however, does not a constitute a full GNSS replacement as it relies on incremental instead of absolute position observations and, hence, can only reduce the position drift but not eliminate it.
The proposed visual inertial navigation filter, specifically designed for the challenges faced by autonomous fixed-wing aircraft that encounter GNSS-denied conditions, merges the observations provided by onboard accelerometers, gyroscopes, and magnetometers with those of the virtual vision sensor, or VVS. The VVS is the denomination of the outputs generated by a visual inertial odometry pipeline that relies on the images of the Earth surface generated by an onboard camera as well as on the navigation filter outputs. The filter is implemented in the manifold of rigid body rotations or in order to minimize the accumulation of errors in the absence of the absolute position observations provided by the GNSS receiver. The results obtained when applying the proposed algorithms to high-fidelity Monte Carlo simulations of two scenarios representative of the challenges of GNSS-denied navigation indicate the following:
The
body attitude estimation is qualitatively similar to that obtained by a standalone inertial filter without any visual aid [
2]. The bounded estimations enable the aircraft to remain aloft in GNSS-denied conditions for as long as it has fuel. Quantitatively, the VVS observations and the associated more accurate filter equations result in significant accuracy improvements when compared with the [
2] baseline.
The
vertical position estimation is qualitatively and quantitatively similar to that of the standalone inertial filter [
2]. In addition to ionospheric effects (which also apply when GNSS signals are available), the altitude error depends on the amount of pressure offset variation since entering GNSS-denied conditions, being unbiased (zero mean) and bounded by atmospheric physics.
The
horizontal position estimation exhibits drastic quantitative improvements over the baseline standalone inertial filter [
2], although from a qualitative point of view, the estimation error is not bounded as the drift cannot be fully eliminated.
Future work will focus on addressing some of the limitations of the proposed navigation system, such as the restriction to fixed-wing vehicles (which originates when using the pitot tube airspeed readings to replace visual navigation when not enough features can be extracted from a texture-poor terrain) and the inability to fly at night or above clouds (infrared images could potentially be used for this purpose).