Autonomous Navigation of Unmanned Aircraft Using Space Target LOS Measurements and QLEKF

Kai Xiong; Peng Zhou; Chunling Wei

doi:10.3390/s22186992

,

and

Science and Technology on Space Intelligent Control Laboratory, Beijing Institute of Control Engineering, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Sensors2022, 22(18), 6992;https://doi.org/10.3390/s22186992

This article belongs to the Special Issue Accuracy Improvement Methods and New Applications of Inertial-Based Navigation System

Version Notes

Order Reprints

Review Reports

Abstract

An autonomous navigation method based on the fusion of INS (inertial navigation system) measurements with the line-of-sight (LOS) observations of space targets is presented for unmanned aircrafts. INS/GNSS (global navigation satellite system) integration is the conventional approach to achieving the long-term and high-precision navigation of unmanned aircrafts. However, the performance of INS/GNSS integrated navigation may be degraded gradually in a GNSS-denied environment. INS/CNS (celestial navigation system) integrated navigation has been developed as a supplement to the GNSS. A limitation of traditional INS/CNS integrated navigation is that the CNS is not efficient in suppressing the position error of the INS. To solve the abovementioned problems, we studied a novel integrated navigation method, where the position, velocity and attitude errors of the INS were corrected using a star camera mounted on the aircraft in order to observe the space targets whose absolute positions were available. Additionally, a QLEKF (Q-learning extended Kalman filter) is designed for the performance enhancement of the integrated navigation system. The effectiveness of the presented autonomous navigation method based on the star camera and the IMU (inertial measurement unit) is demonstrated via CRLB (Cramer–Rao lower bounds) analysis and numerical simulations.

Keywords:

unmanned aircraft; autonomous navigation; LOS measurement; Q-learning; extended Kalman filter

1. Introduction

Recently, unmanned aircrafts flying at altitudes ranging from 20–100 km above sea level have received increasing interest in both the aeronautic and astronautic fields with respect to various applications [1,2,3]. As one of the key technologies of unmanned aircrafts, the precise navigation technique is essential for ensuring the feasibility and efficiency of the onboard guidance and control system [4,5,6].

Currently, the INS/GNSS integrated navigation system is widely used to provide information about the position, velocity and attitude of unmanned aircrafts, such as X-43A, X-51A and HTV-2. The INS/GNSS integration takes advantage of the consistently high accuracy of the GNSS and the short-term stability of the INS, which is promising for the achievement of precise navigation parameters [7,8,9]. However, the performance of INS/GNSS integrated navigation is easily affected by the disturbance of the GNSS signals. The use of INS alone is limited, as the navigation error increases with time, and it is sensitive to the initial errors [10,11].

To enhance the navigation performance, many researchers have focused on other autonomous navigation approaches [12,13,14,15,16]. A typical method is INS/CNS integration, where the accumulated errors of the gyroscopes in the IMU are compensated by the measurements of star sensors mounted on the aircraft [17,18,19]. Compared with the GNSS, one of the main characteristics of INS/CNS integrated navigation is that it relies on the optical signals from celestial bodies, such that the high-autonomous and anti-disturbance abilities of the navigation system are retained. The main limitation of traditional INS/CNS integrated navigation is that the position error accumulation caused by the bias of the accelerators in the IMU cannot be suppressed efficiently. Thus, it also suffers from an increasing number of errors over long flights. Achieving precise navigation information for unmanned aircrafts with a large flight envelope in the GNSS-denied environment remains a challenging problem.

In order to address this problem, this paper presents an INS/Vision integrated navigation method based on the star camera, which is mounted on the aircraft in order to measure the LOS vectors of space targets with the available position-related information, such as low-orbit satellites with a known ephemeris [20]. Different from the traditional star sensor, the star camera captures images of both the low-orbit satellites and the background stars in the field of view (FOV), and the LOS vectors of the satellites in the inertial frame can be extracted directly using the star catalogue and the least square (LS) algorithm. The current star cameras have the ability to observe low-orbit satellites and stars simultaneously during daytime when attached to unmanned aircrafts in near space. The measurements of the IMU and the star camera can be fused with a navigation filter in order to estimate the kinematic state of the aircraft.

The benefits of the presented navigation method are rooted in three aspects. Firstly, for unmanned aircraft in near space, the observation of the space targets, which are widely distributed within the celestial sphere, is less sensitive to the effect of clouds in comparison with landmark-based navigation [21,22]. Secondly, as it is easier to accurately measure the LOS vector of the space target than the direction vector of the Earth center using current technology, the presented method is promising for the achievement of a high rate of accuracy compared to navigation approaches based on the starlight and horizon reference [23,24]. Thirdly, both the position and the attitude information of the unmanned aircrafts can be measured with the star camera, which is able to observe the space target and the starlight simultaneously.

The main contributions of this work are as follows. Firstly, a novel autonomous navigation method is presented to determine the position, velocity and attitude of the unmanned aircrafts. Secondly, the theoretical accuracy of the presented method is analyzed through the calculation of the Cramer-Rao lower bounds (CRLB) based on the visibility analysis of the space targets. Thirdly, a Q-learning extended Kalman filter (QLEKF), which was designed to improve the navigation accuracy of space-target-based INS/Vision integration, is discussed.

This paper is divided into five sections. Following this introduction, the state and observation models of space-target-based INS/Vision integrated navigation are established in Section 2. Then, the method of the visibility analysis is outlined in Section 3. In addition, the QLEKF structure and an explanation of the learning process are given in Section 4. To demonstrate the performance of the presented method, a CRLB analysis and simulation study are described in Section 5. Finally, a brief conclusion is offered in Section 6.

2. System Model

2.1. Main Idea

The presented navigation method is based on the observation of a space target LOS vector, as shown in Figure 1. The LOS vector of the space target in the inertial frame is acquired through the processing of the image captured by the star camera, which contains both the space targets with a known ephemeris and the background stars in the star catalog. As the observation of the space target LOS vector can be implemented by the star camera alone, it does not depend on the misalignments between the different instruments, such that the possible sources of error are reduced. The observations of the space target LOS vector and the starlight vector are used to determine the position and attitude of the aircraft. It should be mentioned that the starlight vectors of the stars in the inertial frame can be calculated with an accuracy of less than 1 mas. The number and magnitude of the observed stars depends on the FOV and the sensitivity of the star camera. Generally, a star camera with an FOV on the order of

10^{°} \times 10^{°}

and a sensitivity of about 8 Mv is feasible for navigation.

Figure 1. The measurement of the space target LOS vector.

The space-target-based INS/Vision integrated navigation system is mainly composed of three parts: the star camera, the IMU and the navigation filtering algorithm, as shown in Figure 2. The IMU is composed of three orthogonal accelerometers and three orthogonal gyroscopes, which are used to measure the angular rate and the specific force of the unmanned aircraft.

Figure 2. Space-target-based INS/Vision integrated navigation method.

The prediction of the aircraft’s position, velocity and attitude are obtained from the measurements taken by the gyroscopes and the accelerometers. The navigation filter, which is designed based on the state model and the observation model of the navigation system, is implemented in order to update the prediction of the position, velocity and attitude based on the measurements by the star camera. As the position, velocity and attitude of the unmanned aircraft are predicted using the IMU, with a high data update rate, the real-time accuracy can be guaranteed in the case when the exposure time of the star camera is recorded accurately. For the implementation of the navigation method, multiple space targets should be tracked by the ground stations, and the ephemeris of the space targets should be undated before the unmanned aircraft takes off. Then, the positions of the space targets can be obtained through orbital propagation during the flight of the aircraft.

2.2. Model for the Attitude Determination

The star camera and the gyroscopes are integrated in order to generate the attitude solution of unmanned aircrafts via the observation of the starlight so as to correct the attitude error and the gyroscope drift. The quaternion

q_{k} = {[\begin{matrix} q_{1 k} & q_{2 k} & q_{3 k} & q_{4 k} \end{matrix}]}^{T}

is used to describe the attitude of the aircraft [25,26]. For the attitude determination system composed of the star camera and the gyroscopes, the state vector

x_{A, k}

is considered as:

x_{A, k} = {[\begin{matrix} δ ρ_{k}^{T} & δ b_{k}^{T} \end{matrix}]}^{T}

(1)

where

δ b_{k} = b_{k} - {\hat{b}}_{k}

is the estimation error of the gyroscope drift,

b_{k}

and

{\hat{b}}_{k}

are the gyroscope drift and its estimate, respectively, and the subscript k is the time label.

δ ρ_{k} = {[\begin{matrix} δ q_{1 k} & δ q_{2 k} & δ q_{3 k} \end{matrix}]}^{T}

is a part of the error quaternion

δ q_{k} = {[\begin{matrix} δ ρ_{k}^{T} & δ q_{4 k} \end{matrix}]}^{T}

, where

δ q_{4 k} = {(1 - {∥ δ ρ_{k} ∥}^{2})}^{0.5}

. The operator

∥ \cdot ∥

represent the Euclidean norm. The error quaternion

δ q_{k}

represents the difference between the estimated quaternion

{\hat{q}}_{k} = {[\begin{matrix} {\hat{q}}_{1 k} & {\hat{q}}_{2 k} & {\hat{q}}_{3 k} & {\hat{q}}_{4 k} \end{matrix}]}^{T}

and the true quaternion

q_{k}

, which is expressed as:

δ q_{k} = q_{k} \otimes {\hat{q}}_{k}^{- 1}

(2)

or

q_{k} = δ q_{k} \otimes {\hat{q}}_{k}

(3)

The operator

\otimes

represents the quaternion product, which is defined as:

q_{k} \otimes {\hat{q}}_{k}^{- 1} = [\begin{matrix} Ψ (q_{k}) & q_{k} \end{matrix}] {\hat{q}}_{k}^{- 1}

(4)

with

{\hat{q}}_{k}^{- 1} = {[\begin{matrix} - {\hat{q}}_{1 k} & - {\hat{q}}_{2 k} & - {\hat{q}}_{3 k} & {\hat{q}}_{4 k} \end{matrix}]}^{T}

(5)

Ψ (q_{k}) = [\begin{matrix} q_{4 k} I_{3 \times 3} - [ρ_{k} \times] \\ - ρ_{k}^{T} \end{matrix}]

(6)

where

ρ_{k} = {[\begin{matrix} q_{1 k} & q_{2 k} & q_{3 k} \end{matrix}]}^{T}

,

[ρ_{k} \times]

is a cross-product matrix defined by

[ρ_{k} \times] = [\begin{matrix} 0 & - q_{3 k} & q_{2 k} \\ q_{3 k} & 0 & - q_{1 k} \\ - q_{2 k} & q_{1 k} & 0 \end{matrix}]

(7)

The discrete-time state model of the integrated navigation system for the attitude determination is expressed as:

x_{A, k} = F_{A, k} x_{A, k - 1} + w_{A, k}

(8)

with

F_{A, k} = [\begin{matrix} I_{3 \times 3} - τ_{A} [ω_{k} \times] & - \frac{τ_{A}}{2} I_{3 \times 3} \\ 0_{3 \times 3} & I_{3 \times 3} \end{matrix}]

(9)

where

τ_{A}

is the time step for the state prediction and

ω_{k} = {[\begin{matrix} ω_{x k} & ω_{y k} & ω_{z k} \end{matrix}]}^{T}

is the estimated angular velocity, which can be obtained from the gyroscope measurements after the compensation for the drift. The denotation

[ω_{k} \times]

is formulated as:

[ω_{k} \times] = [\begin{matrix} 0 & - ω_{z k} & ω_{y k} \\ ω_{z k} & 0 & - ω_{x k} \\ - ω_{y k} & ω_{x k} & 0 \end{matrix}]

(10)

The process noise

w_{A, k}

is modeled to obtain zero-mean Gaussian white noise with the covariance matrix

Q_{A, k}

.

For simplicity, as the star sensor is mounted on the aircraft, the aircraft body frame and the star camera frame are not distinguished in this paper. For the attitude determination, the starlight vector on the aircraft body frame is observed using the star camera. The measurement is considered as:

y_{A, k} = [\begin{matrix} u_{B k}^{(1)} - {\hat{u}}_{B k}^{(1)} \\ u_{B k}^{(2)} - {\hat{u}}_{B k}^{(2)} \\ ⋮ \\ u_{B k}^{(N)} - {\hat{u}}_{B k}^{(N)} \end{matrix}]

(11)

where

u_{B k}^{(j)}

and

{\hat{u}}_{B k}^{(j)} = {[\begin{matrix} {\hat{u}}_{B x k}^{(j)} & {\hat{u}}_{B y k}^{(j)} & {\hat{u}}_{B z k}^{(j)} \end{matrix}]}^{T}

(

j = 1, 2, \dots, N

) are the starlight vectors of the j-th star in the aircraft body frame and its estimate, while N is an integer, which denotes the number of visible stars in the FOV of the star camera.

To construct an observation model which relates the state vector to the measurements, the following function of the quaternion is introduced:

u_{B k}^{(j)} = A (q_{k}) u_{I k}^{(j)}

(12)

where

u_{I k}^{(j)}

is the starlight vector of the j-th star in the inertial frame, which can be calculated according to the star catalog, while

A (q_{k})

is the attitude transformation matrix from the inertial frame to the aircraft body frame, which is formulated as:

A (q_{k}) = (q_{4 k}^{2} - ρ_{k}^{T} ρ_{k}) I_{3 \times 3} + 2 ρ_{k} ρ_{k}^{T} - 2 q_{4 k} [ρ_{k} \times]

(13)

Accordingly,

{\hat{u}}_{B k}^{(j)}

is calculated as:

{\hat{u}}_{B k}^{(j)} = A ({\hat{q}}_{k}) u_{I k}^{(j)}

(14)

Substituting (3) into (12), we have:

\begin{array}{l} u_{B k}^{(j)} & = A (δ q_{k} \otimes {\hat{q}}_{k}) u_{I k}^{(j)} \\ = A (δ q_{k}) A ({\hat{q}}_{k}) u_{I k}^{(j)} \end{array}

(15)

Inserting (14) into (15) yields:

u_{B k}^{(j)} = A (δ q_{k}) {\hat{u}}_{B k}^{(j)}

(16)

As the measurement accuracy of the star camera is on the order of a few arcsec, neglecting the high-order terms, the matrix

A (δ q_{k})

is simplified as:

A (δ q_{k}) = [\begin{matrix} 1 & 2 δ q_{3 k} & - 2 δ q_{2 k} \\ - 2 δ q_{3 k} & 1 & 2 δ q_{1 k} \\ 2 δ q_{2 k} & - 2 δ q_{1 k} & 1 \end{matrix}]

(17)

From (16) and (17), we obtain:

u_{B k}^{(j)} - {\hat{u}}_{B k}^{(j)} = 2 [{\hat{u}}_{B k}^{(j)} \times] δ q_{k}

(18)

where

[{\hat{u}}_{B k}^{(j)} \times] = [\begin{matrix} 0 & - {\hat{u}}_{B z k}^{(j)} & {\hat{u}}_{B y k}^{(j)} \\ {\hat{u}}_{B z k}^{(j)} & 0 & - {\hat{u}}_{B x k}^{(j)} \\ - {\hat{u}}_{B y k}^{(j)} & {\hat{u}}_{B x k}^{(j)} & 0 \end{matrix}]

(19)

According to (11) and (18), for synchronously observed multiple stars, the measurement model is expressed as:

y_{A, k} = H_{A, k} x_{A, k} + v_{A, k}

(20)

where

H_{A, k} = [\begin{matrix} 2 [{\hat{u}}_{B k}^{(1)} \times] & 0_{3 \times 3} \\ 2 [{\hat{u}}_{B k}^{(2)} \times] & 0_{3 \times 3} \\ ⋮ & ⋮ \\ 2 [{\hat{u}}_{B k}^{(N)} \times] & 0_{3 \times 3} \end{matrix}]

(21)

The measurement noise

v_{A, k}

is modeled as Gaussian white noise with the zero mean and covariance matrix

R_{A, k}

.

Using the system model shown in (8) and measurement model shown in (20), it is convenient to design a Kalman filter (KF) in order to estimate the state vector

x_{A, k}

by means of the measurement

y_{A, k}

. Then, the estimates of

δ ρ_{k}

and

δ b_{k}

are used to correct the attitude quaternion

{\hat{q}}_{k}

and gyroscope drift

{\hat{b}}_{k}

. The procedure of the attitude determination using the Kalman filtering algorithm is similar to that in [27], and it is omitted here for simplicity.

2.3. Model for the Position Estimation

In space-target-based INS/Vision integration, the space target LOS vectors observed using the star camera are used to correct the INS position and velocity errors in the inertial frame. The state vector for the position estimation consists of position vector

r_{k} = {[\begin{matrix} r_{x k} & r_{y k} & r_{z k} \end{matrix}]}^{T}

, velocity vector

v_{k}

and accelerometer bias

\nabla_{k}

, which is expressed as:

x_{P, k} = {[\begin{matrix} r_{k}^{T} & v_{k}^{T} & \nabla_{k}^{T} \end{matrix}]}^{T}

(22)

The state model is established to describe the dynamics of the state vector, which is represented as:

x_{P, k} = f (x_{P, k - 1}) + w_{P, k}

(23)

where

w_{P, k}

is the process noise with the covariance matrix

Q_{P, k}

. The discretized nonlinear function

f (\cdot)

is formulated as:

f (x_{P, k}) = [\begin{matrix} r_{k} \\ v_{k} \\ \nabla_{k} \end{matrix}] + [\begin{matrix} v_{k} \\ A^{T} (q_{k}) (f_{B k} + \nabla_{k}) + g (x_{P, k}) \\ 0 \end{matrix}] τ_{P}

(24)

where

τ_{P}

is the time step for the state prediction,

q_{k}

can be obtained through the method shown in the previous sub-section,

f_{B k}

is the output of the accelerometers in the aircraft body frame, and

g (x_{P, k})

is a function used to describe the gravitational acceleration, which can be written as:

g (x_{P, k}) = - \frac{μ_{E} r_{k}}{∥ r_{k} ∥^{3}} + p (r_{k})

(25)

where

μ_{E}

is the gravitational constant of Earth and the function

p (r_{k})

encapsulates the perturbation accelerations of the aircraft other than those that are cause by the two-body gravitational acceleration. The Jacobian matrix of the state model is derived as:

F_{P, k} = \frac{\partial f (x_{P, k})}{\partial x_{P, k}} = I_{9 \times 9} + Φ_{P, k} τ_{P}

(26)

with

Φ_{P, k} = [\begin{matrix} 0_{3 \times 3} & I_{3 \times 3} & 0_{3 \times 3} \\ \frac{\partial g (x_{P, k})}{\partial x_{P, k}} & 0_{3 \times 3} & A^{T} (q_{k}) \\ 0_{3 \times 3} & 0_{3 \times 3} & 0_{3 \times 3} \end{matrix}]

(27)

Let

\frac{\partial g (x_{P, k})}{\partial x_{P, k}} = [\begin{matrix} Φ_{P, k} (4, 1) & Φ_{P, k} (4, 2) & Φ_{P, k} (4, 3) \\ Φ_{P, k} (5, 1) & Φ_{P, k} (5, 2) & Φ_{P, k} (5, 3) \\ Φ_{P, k} (6, 1) & Φ_{P, k} (6, 2) & Φ_{P, k} (6, 3) \end{matrix}]

(28)

For simplicity, only the two-body gravitational acceleration is taken into consideration. The elements in the matrix shown in (28) are written as:

Φ_{P, k} (4, 1) = - \frac{μ_{e}}{∥ r_{k} ∥^{3}} (1 - \frac{3 r_{x k}^{2}}{∥ r_{k} ∥^{2}})

(29)

Φ_{P, k} (4, 2) = \frac{3 μ_{e} r_{x k} r_{y k}}{{∥ r_{k} ∥}^{5}}

(30)

Φ_{P, k} (4, 3) = \frac{3 μ_{e} r_{x k} r_{z k}}{∥ r_{k} ∥^{5}}

(31)

Φ_{P, k} (5, 1) = Φ_{P, k} (4, 2)

(32)

Φ_{P, k} (5, 2) = - \frac{μ_{e}}{∥ r_{k} ∥^{3}} (1 - \frac{3 r_{y k}^{2}}{∥ r_{k} ∥^{2}})

(33)

Φ_{P, k} (5, 3) = \frac{3 μ_{e} r_{y k} r_{z k}}{∥ r_{k} ∥^{5}}

(34)

Φ_{P, k} (6, 1) = Φ_{P, k} (4, 3)

(35)

Φ_{P, k} (6, 2) = Φ_{P, k} (5, 3)

(36)

Φ_{P, k} (6, 3) = - \frac{μ_{e}}{∥ r_{k} ∥^{3}} (1 - \frac{3 r_{z k}^{2}}{∥ r_{k} ∥^{2}})

(37)

The space target LOS vector in the inertial frame is used as the measurement for the position estimation. Accordingly, the measurement model is expressed as:

y_{P, k}^{(i)} = h^{(i)} (x_{P, k}) + v_{P, k}^{(i)}

(38)

with

h^{(i)} (x_{P, k}) = \frac{r_{T k}^{(i)} - r_{k}}{∥ r_{T k}^{(i)} - r_{k} ∥}

(39)

where

y_{P, k}^{(i)}

is the measurement of the i-th space target LOS vector,

r_{T k}^{(i)}

is the position vector of the i-th space target, which can be calculated according to the known ephemeris and orbit propagation, and

v_{P, k}^{(i)}

is the measurement noise with the covariance matrix

R_{P, k}^{(i)}

. The corresponding Jacobian matrix is:

\begin{matrix} H_{P, k}^{(i)} = \frac{\partial h^{(i)} (x_{P, k})}{\partial x_{P, k}} \\ = [\begin{matrix} - \frac{1}{∥ r_{T k}^{(i)} - r_{k} ∥} I_{3 \times 3} + \frac{(r_{T k}^{(i)} - r_{k}) {(r_{T k}^{(i)} - r_{k})}^{T}}{{∥ r_{T k}^{(i)} - r_{k} ∥}^{3}} & 0_{3 \times 3} & 0_{3 \times 3} \end{matrix}] \end{matrix}

(40)

According to the state model shown in (23) and the measurement model shown in (38), the extended Kalman filter (EKF) can be designed to fuse the accelerometer and star camera data. Furthermore, an advanced filtering algorithm, which is presented in Section 4, can be used to enhance the navigation performance.

3. Principle of Visibility Analysis

The visibility analysis of the space targets is the basis for the performance evaluation of the integrated navigation method. In practice, only the visible space targets can be observed with the star camera in order to provide valuable navigation-related information. The user should confirm whether certain space targets are visible to the aircraft before the implementation of the navigation scheme. To facilitate the application, certain principles used to analyze the visibility of the space targets are described in this section.

First, for a space target to be visible, its apparent magnitude should be sufficient so as to be detected with the star camera, i.e.,

m_{T}^{(i)} < m_{0}

(41)

where

m_{T}^{(i)}

is the apparent magnitude of the i-th space target and

m_{0}

is a magnitude bound specified by the sensitivity of the star camera. To simplify the notation, the time label k is omitted in this section. The apparent magnitude

m_{T}^{(i)}

can be calculated as:

m_{T}^{(i)} = M_{T}^{(i)} + 2.5 \log_{10} [\frac{{∥ r_{S u n} - r_{T}^{(i)} ∥}^{2} {∥ r - r_{T}^{(i)} ∥}^{2}}{d_{0}^{4} p (ξ^{(i)})}]

(42)

where

M_{T}^{(i)}

is the absolute magnitude of the space target, which is approximated by:

M_{T}^{(i)} = m_{S u n} - 5 \log_{10} (\frac{r d^{(i)}}{d_{0}} \sqrt{a l^{(i)}})

(43)

where

m_{S u n} \approx - 27

is the apparent magnitude of Sun,

r d^{(i)}

is the observed body radius,

a l^{(i)}

is the body albedo of the space target and

d_{0} \approx 1.4959787 \times 10^{8} km

is an astronomical unit.

r_{S u n}

is the position vector of Sun.

p (ξ^{(i)})

is a function to describe the integration of the reflected light, which is approximated by:

p (ξ^{(i)}) = \frac{2}{3 π} [(π - ξ^{(i)}) \cos (ξ^{(i)}) + \sin (ξ^{(i)})]

(44)

where

ξ^{(i)}

is the sun–target–aircraft angle of the i-th space target, which is expressed as:

ξ^{(i)} = \cos^{- 1} [\frac{r_{S u n} - r_{T}^{(i)}}{∥ r_{S u n} - r_{T}^{(i)} ∥} \cdot \frac{r - r_{T}^{(i)}}{∥ r - r_{T}^{(i)} ∥}]

(45)

It describes the relative positions of the sun, the space target and the unmanned aircraft, as shown in Figure 3.

Figure 3. Relative positions of the Sun, space target and unmanned aircraft.

Secondly, a necessary condition for the visibility of the space target is that it is not hidden by the shadow of Earth, which can be judged with the following condition:

ψ^{(i)} < 90 ° + β^{(i)}

(46)

where

ψ^{(i)}

is the Sun–Earth–target angle of the i-th target, which is expressed as:

ψ^{(i)} = \cos^{- 1} (\frac{r_{T}^{(i)}}{∥ r_{T}^{(i)} ∥} \cdot \frac{r_{S u n}}{∥ r_{S u n} ∥})

(47)

From Figure 4, the angle bound

β^{(i)}

is calculated as:

β^{(i)} = \sin^{- 1} [\frac{{({∥ r_{T}^{(i)} ∥}^{2} - R_{e}^{2})}^{0.5}}{∥ r_{T}^{(i)} ∥}]

(48)

where

R_{e}

is the Earth’s radius. The space target is illuminated by the Sun and can be seen as a candidate observation object when it is not within the shadow of the Earth.

Figure 4. Earth’s shadow in the orbit of space target.

Thirdly, for the space target to be visible, it must not be occulted by celestial bodies, such as the Earth, Sun and Moon. A space target is believed to be occulted when its LOS vector points to a celestial body. Generally, the LOS vector of a space target cannot be measured effectively by the star camera in this case, as the images of the space target and the celestial body overlap. For example, the relative positions of the space target, the aircraft and Earth are shown in Figure 5.

Figure 5. Relative positions of the Earth, aircraft and target.

From the figure, to judge whether the space target is occulted by the Earth, the following condition is given:

θ^{(i)} > θ_{0}

(49)

where

θ^{(i)}

is the Earth–aircraft–target angle, which is expressed as:

θ^{(i)} = \cos^{- 1} (\frac{r - r_{T}^{(i)}}{∥ r - r_{T}^{(i)} ∥} \cdot \frac{r}{∥ r ∥})

(50)

The angle bound

θ_{0}

is calculated as:

θ_{0} = \sin^{- 1} (\frac{R_{e}}{∥ r ∥})

(51)

Similar conditions can be derived to judge whether the space target is occulted by the Sun or Moon.

4. Q-Learning Extended Kalman Filter

4.1. Q-Learning for the Filter Design

In this section, we describe an advanced navigation filter that was designed to estimate the position and velocity of the unmanned aircraft using the IMU and the star camera mounted on the aircraft in cases where the GNSS is unavailable. The EKF is a classic algorithm used to fuse the measurements of different sensors. An important procedure for the optimization of the EKF is the fine-tuning of the filtering parameters, such as the noise covariance matrices. However, the pre-determined noise covariance matrices based on the prior information may be not appropriate for aircrafts during flight, which will deteriorate the state estimation. In order to address this problem, we attempted to tune the process noise covariance with the measurement data online using the Q-learning approach so as to achieve a higher accuracy of the navigation system.

The Q-learning is a kind of reinforcement learning (RL) approach for solving sequential decision problems, where the agent interacts with an initially unknown environment and optimizes its behavior in order to maximize its cumulative reward [28,29,30,31]. Q-learning has received considerable attention, with many successful applications in various fields, such as gaming, robots, industrial process control and network management. Figure 6 illustrates the main components of Q-learning. An agent takes action

a

in the environment described by the state

s

, which may cause the change in the environment’s state (from

s

to

s'

), and receives an immediate reward

R (s, a)

from the environment as a feedback on the action. The learning objective is to modify the action selection policy such that the cumulative reward, which is represented by the Q function

Q_{i} (s, a)

, is maximized. Q-learning has been studied as an important data-based approach for adaptive control and optimal filters and is preferred for practical systems with uncertain models [32,33,34].

Figure 6. Main components of the iterative learning procedure.

In this study, to deal with the continuous state and action spaces, a Q-learning method with function approximation is adopted, where the Q function is expressed as:

Q_{i} (s, a) = Φ^{T} (s, a) Θ_{i}

(52)

with

Φ (s, a) = [\begin{matrix} ϕ_{1} (s, a) \\ ϕ_{2} (s, a) \\ ⋮ \\ ϕ_{D} (s, a) \end{matrix}], Θ_{i} = [\begin{matrix} θ_{1 i} \\ θ_{2 i} \\ ⋮ \\ θ_{D i} \end{matrix}]

(53)

where

ϕ_{d} (s, a)

(

d = 1, 2, \dots, D

) are basis functions,

θ_{d i}

(

d = 1, 2, \dots, D

) are weights and the subscript

i

is the number of iterations. In this paper, the basis function is chosen as:

ϕ_{d} (s, a) = \frac{e x p (- [\begin{matrix} s - μ_{s d} & a - μ_{a d} \end{matrix}] [\begin{matrix} σ_{s d}^{- 2} \\ σ_{a d}^{- 2} \end{matrix}] [\begin{matrix} s - μ_{s d} \\ a - μ_{a d} \end{matrix}])}{{\sum_{τ = 1}^{D} e x p (- 2 [\begin{matrix} s - μ_{s τ} & a - μ_{a τ} \end{matrix}] [\begin{matrix} σ_{s τ}^{- 2} \\ σ_{a τ}^{- 2} \end{matrix}] [\begin{matrix} s - μ_{s τ} \\ a - μ_{a τ} \end{matrix}])}^{0.5}}

(54)

where

μ_{s d}

,

μ_{a d}

,

σ_{s d}

and

σ_{a d}

are the parameters of the basis function. It should be mentioned that the choice of the basis function is determined by the user, and a poor choice may lead to a poor filtering performance. In this study, the basis function is designed with the structure similar to that in [35], such that the appropriate action can be activated. The incremental weight update rule of Q-learning is given by:

Θ_{i + 1} = Θ_{i} + α [R (s, a) + γ \max_{a^{'}} (Φ^{T} (s^{'}, a^{'}) Θ_{i}) - Φ^{T} (s, a) Θ_{i}] Φ (s, a)

(55)

where

0 \leq α < 1

is the learning rate,

0 < γ \leq 1

is the discount factor and

α

and

γ

are the tuning parameters. It is easy to observe from the previous equations that:

Φ^{T} (s, a) Θ_{i + 1} = Φ^{T} (s, a) Θ_{i} + α [R (s, a) + γ \max_{a^{'}} (Φ^{T} (s^{'}, a^{'}) Θ_{i}) - Φ^{T} (s, a) Θ_{i}] Φ^{T} (s, a) Φ (s, a)

(56)

or

Q_{i + 1} (s, a) = Q_{i} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{i} (s^{'}, a^{'}) - Q_{i} (s, a)]

(57)

The convergence of the iterative Q function is studied in the following theorem, which can help readers to comprehend the key idea of the learning procedure.

Theorem 1:

Considering the iterative Q function sequence

{Q_{i} (s, a), i = 1, 2, \dots}

, if the initial Q function

Q_{0} (s, a)

satisfies the following condition:

0 \leq Q_{0} (s, a) \leq R (s, a) + γ \max_{a^{'}} Q_{0} (s^{'}, a^{'})

(58)

and there exist the constants

\bar{δ} \geq 1 \geq \underline{δ} \geq 0

and

λ \geq 0

, such that the following inequalities hold:

\underline{δ} {\bar{Q}}_{\infty} (s, a) \leq Q_{0} (s, a) \leq \bar{δ} {\bar{Q}}_{\infty} (s, a)

(59)

λ R (s, a) \geq \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})

(60)

where

{\bar{Q}}_{\infty} (s, a) = R (s, a) + \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})

(61)

then the iterative Q function

Q_{i} (s, a)

converges to the optimal Q function

Q^{*} (s, a)

as

i \to \infty

, i.e.,

\lim_{i \to \infty} Q_{i} (s, a) = Q^{*} (s, a)

(62)

where the function

Q^{*} (s, a)

is formulated as:

Q^{*} (s, a) = R (s, a) + γ \max_{a^{'}} Q^{*} (s^{'}, a^{'})

(63)

The proof of the theorem is provided in Appendix A. It shows that, under certain conditions, the iterative Q function

Q_{i} (s, a)

is convergent to the optimal Q function

Q^{*} (s, a)

, which represents the maximum cumulative reward. Generally, if the constants

\bar{δ}

and

λ

are sufficiently large, and

\underline{δ}

is sufficiently small, the conditions shown in (59) and (60) can be satisfied.

4.2. Filter Algorithm

Based on the system model for space-target-based INS/Vision integrated navigation, we designed the QLEKF algorithm, where the Q-learning is coupled with the EKF for the fine-tuning of the filter parameters with the objective of improving the estimation accuracy. The process noise covariance matrix was chosen for tuning, as it is crucial for calculating the filtering gain for the information contained in the system model and for the measurement to be utilized reasonably. In the Q-learning approach, the state

s

is related to the current value of the process noise covariance matrix. For example, it can be assigned as 0 for the nominal noise covariance matrix. The action

a

is related to the magnitude so as to enlarge or reduce the current process noise covariance matrix. It can be assigned as different numbers, such as 0, 1, 2 …, for different degrees of tuning. The reward

R (s, a)

is constructed based on the innovation of the EKF, which is a well-known indication of the estimation accuracy of the filter. Generally, a smaller innovation indicates a superior filtering performance, and vice versa.

For the system model shown in (23) and (38), the QLEKF algorithm is formulated in Algorithm 1. For simplicity, the subscript

i

of the weight vector

Θ_{i}

, representing the number of iterations, is omitted here.

Algorithm1: Q-learning extended Kalman filter

1:

{\hat{x}}_{P, 0}^{B} = {\hat{x}}_{P, 0}^{E} = {\hat{x}}_{P, 0}

,

P_{P, 0}^{B} = P_{P, 0}^{E} = P_{P, 0}

⊳

Initialization

2: k

\leftarrow

0

3:

Θ \leftarrow 0

4: for each period, do

5: for all

a \in A

, do

6:

C^{B} \leftarrow 0

,

C^{E} \leftarrow 0

7: for

t = 1, 2, \dots, T

, do

8:

k \leftarrow k + 1

9:

[{\hat{x}}_{P, k}^{B}, P_{P, k}^{B}, {\tilde{y}}_{P, k}^{B}] \leftarrow EKF ({\hat{x}}_{P, k - 1}^{B}, P_{P, k - 1}^{B}, y_{P, k}, Q_{P, k}, R_{P, k})

⊳

Benchmark filter

10:

[{\hat{x}}_{P, k}^{E}, P_{P, k}^{E}, {\tilde{y}}_{P, k}^{E}] \leftarrow EKF ({\hat{x}}_{P, k - 1}^{E}, P_{P, k - 1}^{E}, y_{P, k}, {\hat{Q}}_{P, k}^{(s, a)}, R_{P, k})

⊳

Exploring filter

11:

[{\hat{x}}_{P, k}, P_{P, k}, {\tilde{y}}_{P, k}] \leftarrow EKF ({\hat{x}}_{P, k - 1}, P_{P, k - 1}, y_{P, k}, {\hat{Q}}_{P, k}^{(s, a_{\max})}, R_{P, k})

⊳

Main filter

12:

C^{B} \leftarrow C^{B} + \frac{1}{T - 1} [{({\tilde{y}}_{k}^{B})}^{T} R_{k}^{- 1} {\tilde{y}}_{k}^{B} - C^{B}]

13:

C^{E} \leftarrow C^{E} + \frac{1}{T - 1} [{({\tilde{y}}_{k}^{E})}^{T} R_{k}^{- 1} {\tilde{y}}_{k}^{E} - C^{E}]

14: end for

15:

R (s, a) \leftarrow C^{B} - C^{E}

⊳

Calculation of Reward

16:

Θ \leftarrow Θ + α [R (s, a) + γ \max_{a^{'}} (Φ^{T} (s ’, a ’) Θ) - Φ^{T} (s, a) Θ] Φ (s, a)

⊳

Update of weight

17:

{\hat{x}}_{P, k}^{E} \leftarrow {\hat{x}}_{P, k}^{B}

,

P_{P, k}^{E} \leftarrow P_{P, k}^{B}

⊳

Resetting of exploring filter

18: end for

19:

a_{\max} \leftarrow \arg \max_{a^{'}} (Φ^{T} (s, a) Θ)

⊳

Selection of the best action

20: end for

21: return

{{\hat{x}}_{P, k}}

and

{P_{P, k}}

⊳

Result of state estimation

In Algorithm 1,

A

is the pre-determined action space to be explored,

T

is a positive number representing the time interval for the calculation of the immediate reward

R (s, a)

,

{\hat{x}}_{P, k}

and

P_{P, k}

are the state estimate and its corresponding error covariance matrix of the main filter, and

{\tilde{y}}_{P, k}

is the innovation. Note that, in regard to the main filter, the process noise covariance matrix

{\hat{Q}}_{P, k}^{(s, a_{\max})}

tuned through Q-learning is used instead of

Q_{P, k}

in the system model.

It can be seen from the algorithm that there are another two parallel filters in addition to the main filter, where the benchmark filter and the exploring filter are distinguished by the superscripts B and S. The difference between the benchmark filter and the exploring filter is that the process noise covariance matrix of the benchmark filter

Q_{P, k}

is obtained from the prior knowledge of the system, while that of the exploring filter

{\hat{Q}}_{P, k}^{(s, a)}

is altered by enlarging or reducing

Q_{P, k}

.

The statistical values of the innovations obtained from the benchmark filter and the exploring filter are compared in order to construct the immediate reward

R (s, a)

, which indicates whether the action

a

is preferable for improving the filtering performance. Note that the estimation of the exploring filter is reset after each update of the weight vector

Θ

to avoid the impacts of historical data on the evaluation of the action. An appropriate action

a_{\max}

for tuning the process noise covariance matrix of the main filter is determined from the approximated Q function

Φ^{T} (s, a) Θ

. The output of the QLEKF is the estimation result of the main filter.

For clarity, the subroutine of the EKF is shown in Algorithm 2.

Algorithm 2: Extended Kalman filter

1: function

EKF ({\hat{x}}_{P, k - 1}, P_{P, k - 1}, y_{P, k}, Q_{P, k}, R_{P, k})

2 : {\hat{x}}_{P, k | k - 1} \leftarrow f ({\hat{x}}_{P, k - 1})

⊳

Prediction

3 : P_{P, k | k - 1} \leftarrow F_{P, k} P_{P, k - 1} F_{P, k}^{T} + Q_{P, k}

4 : K_{P, k} \leftarrow P_{P, k | k - 1} H_{P, k}^{T} {(H_{P, k} P_{P, k | k - 1} H_{P, k}^{T} + R_{P, k})}^{- 1}

5 : {\tilde{y}}_{P, k} \leftarrow y_{P, k} - h ({\hat{x}}_{P, k | k - 1})

6 : {\hat{x}}_{P, k} \leftarrow {\hat{x}}_{P, k | k - 1} + K_{P, k} {\tilde{y}}_{P, k}

⊳

Update

7 : P_{P, k} \leftarrow (I - K_{P, k} H_{P, k}) P_{P, k | k - 1} {(I - K_{P, k} H_{P, k})}^{T} + K_{P, k} R_{P, k} K_{P, k}^{T}

8: return

{\hat{x}}_{P, k}

, P_{P, k}

, {\tilde{y}}_{P, k}

9: end function

In Algorithm 2,

K_{P, k}

denotes the Kalman gain. It is easy to see from the algorithm that the EKF is designed based on the system model shown in Section 2.3. The innovation

{\tilde{y}}_{P, k}

, as the output of Algorithm 2, is not used in the next iteration, but it is used to construct the immediate reward

R (s, a)

in Algorithm 1.

5. Simulations

5.1. Simulation Conditions

Simulations were performed to evaluate the performance of the space-target-based INS/Vision integrated navigation. An ideal trajectory was produced for the unmanned aircraft with an altitude of about 30 km and velocity of about 680 m/s, including the true position, velocity and attitude. The noises of the IMU and the star camera were added to the ideal values to obtain the simulated measurement data. For the data generation of the gyroscope, the angle random walk, the rate random walk and the bias were taken into consideration. The LEO satellites (low Earth orbit) with a semi-major axis of 6938 km and inclination of

55 °

were seen as the observed space targets. The construction of the satellite constellation is shown in Figure 7.

Figure 7. Satellite constellation in the simulation scenario.

The parameters for the integrated navigation simulation are listed in Table 1.

Table 1. Basic simulation configuration.

It should be pointed out that the estimation performance and computational effort of the algorithm depend on the QLEKF parameters. In particular, it can be seen from the simulation results that the QLEKF parameters shown in Table 1 yield an improved performance, and the convenience of calculation is retained. Further works are planned in which we will perform a rigorous comparison analysis for the estimation performance using different QLEKF parameters. The absolute position, velocity and attitude of the aircraft are estimated with the method shown in Section 2 and Section 4 in cases where the positions of the space targets are known. The comparison between the state estimation and the ideal trajectory yields the navigation results.

5.2. CRLB Based on the Visibility Analysis

The visibility of the space targets for the star camera on the aircraft is analyzed using the principles shown in Section 3. The curves of the apparent magnitude of some space targets over time are shown in Figure 8. It can be seen from the figure that, as the specified sensitivity of the star camera in Algorithm 1 is 8 Mv, some space targets are bright enough to be observed in certain areas of the orbit.

Figure 8. Apparent magnitudes of the space targets.

The curves of the Sun-Earth-target angles of some space targets over time are shown in Figure 9. The angle bound

90 ° + β^{(i)}

, representing Earth’s shadow, is plotted as the shadow in the figure. From this result, we can see that the space targets are not in Earth’s shadow for most of the time.

Figure 9. Sun–Earth–target angles and the corresponding bounds.

The curves of the Earth-aircraft-target angles, Sun-aircraft-target angles of some space targets over time and the corresponding angle bounds are shown in Figure 10 and Figure 11, respectively. The figures illustrate that, sometimes, the space targets are not occulted by the Earth or Sun.

Figure 10. Earth-aircraft-target angles and the corresponding bounds.

Figure 11. Sun-aircraft-target angles and the corresponding bounds.

Based on the previous analysis, the potential performance of the presented navigation method was evaluated through the CRLB, which was derived from the system model. The CRLB provides a theoretical bound on the achievable accuracy of the navigation system. Thus, it is useful to study the feasibility of a novel navigation scheme. The calculation procedure of the CRLB can be found in the literature [36,37].

Figure 12 and Figure 13 describe the theoretical bounds of the three-axis position estimation errors of the integrated navigation system with different noise levels of the accelerometer and the star camera in cases where only the visible space targets are observed. It indicates that a high navigation performance is achievable using the state-of-the-art sensors by applying the space-target-based INS/Vision integrated navigation method.

Figure 12. CRLB for the position estimation with different accelerometer noise levels.

Figure 13. CRLB for the position estimation with different star camera noise levels.

5.3. Simulation Results of the Navigation Filter

The navigation results of the space-target-based INS/Vision integration analysis obtained from the EKF are shown in Figure 14, Figure 15 and Figure 16, including the curves of the position, velocity and attitude errors, where the solid lines represent the estimation error and the dashed lines represent the

3 σ

bound, calculated from the error covariance matrix of the EKF.

Figure 14. Position error of space-target-based INS/Vision integrated navigation.

Figure 15. Velocity error of space-target-based INS/Vision integrated navigation.

Figure 16. Attitude error of space-target-based INS/Vision integrated navigation.

As expected, the position, velocity and attitude can all be estimated correctly by the measurement of the LOS vectors of the space targets and starlight vectors. The mean estimation errors of the position, velocity and attitude of the space-target-based INS/Vision integration were 12.96 m, 0.32 m/s and 0.94″, respectively. This shows that the presented method has the potential to realize the absolute navigation of unmanned aircrafts with a high accuracy. For the implementation of the navigation method, the position error of the space target should be sufficiently less than the accuracy specification of the unmanned aircraft. In the simulation, only the measurements of the visible space targets are used to update the state vector. In the case where there are no visible space targets, the state vector is predicted by the IMU. The effect of the visibility is not distinct, as there are multiple space targets, as shown in Figure 7.

Next, the ability of the QLEKF to improve the navigation accuracy for aircraft navigation through space-target-based INS/Vision integration was assessed. Monte Carlo simulations were conducted to evaluate the performance of the QLEKF in comparison with the traditional EKF and adaptive extended Kalman filter (AEKF) [38]. The QLEKF aims to fine-tune the noise covariance matrix obtained from the prior knowledge, while the AEKF aims to estimate the noise covariance matrix together with the state vector. Figure 17 and Figure 18 show the norms of the root mean squared errors (RMSE) of the position and attitude estimates obtained from the EKF, the AEKF and the QLEKF.

Figure 17. RMSE of the position estimation for EKF, AEKF and QLEKF.

Figure 18. RMSE of the attitude estimation for EKF, AEKF and QLEKF.

According to the simulation results, the EKF provides similar results to the AEKF, while the QLEKF outperforms both the EKF and the AEKF with higher precision on average. These results demonstrate that the Q-learning approach is valuable for fine-tuning the critical parameters of the navigation filter.

6. Conclusions

This paper presents a novel navigation method that can be used to estimate the position, along with velocity and attitude, of unmanned aircrafts, using the star camera to observe the LOS vectors of the space targets with known positions. This method is especially suitable for aircrafts with long-term and high-accuracy autonomous navigation requirement. Based on the mathematical model and the visibility analysis, the CRLB of the integrated navigation system was calculated in order to validate the effectiveness of the presented method. In addition, the QLEKF was designed to enhance the navigation performance. The simulation results demonstrate the high performance of the space-target-based INS/Vision integrated navigation method. Further research will focus on the improvement of the navigation scheme in order to address the unfavorable effects of the space target position error and optical measurement error caused by thermal environment on the performance of the system.

Author Contributions

Conceptualization, K.X. and C.W.; methodology, K.X.; software, P.Z.; validation, K.X., P.Z. and C.W.; formal analysis, K.X.; writing—original draft preparation, K.X.; writing—review and editing, P.Z.; supervision, C.W.; project administration, K.X.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Civil Aerospace Advance Research Project, grant number D020403, and the National Natural Science Foundation of China, grant number U21B6001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1.

First, to facilitate the analysis of the learning property, consider a specific case. In the case, where

γ = 1

, with the initial value

{\bar{Q}}_{0} (s, a) = Q_{0} (s, a)

, the iterative Q function shown in (57) can be written as:

{\bar{Q}}_{i + 1} (s, a) = {\bar{Q}}_{i} (s, a) + α [R (s, a) + \max_{a^{'}} {\bar{Q}}_{i} (s^{'}, a^{'}) - {\bar{Q}}_{i} (s, a)]

(A1)

The mathematical induction is adopted to prove the following statement:

[1 + (\prod_{j = 0}^{i - 1} (1 - \frac{α}{λ + 1})) (\underline{δ} - 1)] {\bar{Q}}_{\infty} (s, a) \leq {\bar{Q}}_{i} (s, a) \leq [1 + (\prod_{j = 0}^{i - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] {\bar{Q}}_{\infty} (s, a)

(A2)

For

i = 0

, from (A1) and the condition shown in (59), we have:

{\bar{Q}}_{1} (s, a) = {\bar{Q}}_{0} (s, a) + α [R (s, a) + \max_{a^{'}} {\bar{Q}}_{0} (s^{'}, a^{'}) - {\bar{Q}}_{0} (s, a)] \leq (1 - α) \bar{δ} {\bar{Q}}_{\infty} (s, a) + α [R (s, a) + \bar{δ} \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})]

(A3)

From (60), it can be verified that:

R (s, a) + \bar{δ} \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'}) \leq [1 + \frac{λ (\bar{δ} - 1)}{λ + 1}] [R (s, a) + \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})]

(A4)

Substituting (61) into (A4) yields:

R (s, a) + \bar{δ} \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'}) \leq [1 + \frac{λ (\bar{δ} - 1)}{λ + 1}] {\bar{Q}}_{\infty} (s, a)

(A5)

It is derived from (A3) and (A5) that:

{\bar{Q}}_{1} (s, a) \leq [1 + (\bar{δ} - 1) (1 - \frac{α}{λ + 1})] {\bar{Q}}_{\infty} (s, a)

(A6)

Similarly, we can obtain:

{\bar{Q}}_{1} (s, a) \geq [1 + (\underline{δ} - 1) (1 - \frac{α}{λ + 1})] {\bar{Q}}_{\infty} (s, a)

(A7)

According to (A6) and (A7), the inequality (A2) holds for

i = 1

. Assume that the inequality (A2) holds for

i = l

, i.e.,

[1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\underline{δ} - 1)] {\bar{Q}}_{\infty} (s, a) \leq {\bar{Q}}_{l} (s, a) \leq [1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] {\bar{Q}}_{\infty} (s, a)

(A8)

From (A1) and the condition shown in (59), we obtain:

{\bar{Q}}_{l + 1} (s, a) \leq (1 - α) [1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] {\bar{Q}}_{\infty} (s, a) + α {R (s, a) + [1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})}

(A9)

From (60), it can be verified that:

R (s, a) + [1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'}) \leq [1 + \frac{λ}{λ + 1} (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] [R (s, a) + \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'})]

(A10)

Inserting (61) into (A10), we have:

R (s, a) + [1 + (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] \max_{a^{'}} {\bar{Q}}_{\infty} (s^{'}, a^{'}) \leq [1 + \frac{λ}{λ + 1} (\prod_{j = 0}^{l - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] {\bar{Q}}_{\infty} (s, a)

(A11)

It follows from (A9) and (A11) that:

{\bar{Q}}_{l + 1} (s, a) \leq [1 + (\prod_{j = 0}^{l} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] {\bar{Q}}_{\infty} (s, a)

(A12)

Similarly, we can obtain:

{\bar{Q}}_{l + 1} (s, a) \geq [1 + (\prod_{j = 0}^{l} (1 - \frac{α}{λ + 1})) (\underline{δ} - 1)] {\bar{Q}}_{\infty} (s, a)

(A13)

From (A12) and (A13), the inequality (A2) holds for

i = l + 1

. Hence, for

i = 0, 1, 2, \dots

, the statement (A2) holds. The mathematical induction is complete.

As

0 \leq α < 1

and

λ \geq 0

, we have:

\lim_{i \to \infty} (\prod_{j = 0}^{l} (1 - \frac{α}{λ + 1})) = 0

(A14)

This leads to:

\lim_{i \to \infty} [1 + (\prod_{j = 0}^{i - 1} (1 - \frac{α}{λ + 1})) (\underline{δ} - 1)] = \lim_{i \to \infty} [1 + (\prod_{j = 0}^{i - 1} (1 - \frac{α}{λ + 1})) (\bar{δ} - 1)] = 1

(A15)

It is derived from (A2) and (A15) that:

\lim_{i \to \infty} {\bar{Q}}_{i} (s, a) = {\bar{Q}}_{\infty} (s, a)

(A16)

Secondly, the following statement is proven by mathematical induction:

Q_{i + 1} (s, a) \geq Q_{i} (s, a)

(A17)

For

i = 0

, from (57) and the condition shown in (58), we get:

Q_{1} (s, a) = (1 - α) Q_{0} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{0} (s^{'}, a^{'})] \geq (1 - α) Q_{0} (s, a) + α Q_{0} (s, a) = Q_{0} (s, a)

(A18)

Thus, the statement (A17) holds for

i = 0

. Assume that the inequality (A17) holds for

i = l - 1

, i.e.,

Q_{l} (s, a) \geq Q_{l - 1} (s, a)

(A19)

It is derived from (57) and (A19) that:

Q_{l + 1} (s, a) = (1 - α) Q_{l} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{l} (s^{'}, a^{'})] \geq (1 - α) Q_{l - 1} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{l - 1} (s^{'}, a^{'})] = Q_{l} (s, a)

(A20)

It is seen from (A20) that the statement (A17) holds for

i = l

. Hence, for

i = 0, 1, 2, \dots

, the statement (A17) holds. The mathematical induction is complete.

Thirdly, the mathematical induction is used to prove the following statement:

Q_{i} (s, a) \leq {\bar{Q}}_{i} (s, a)

(A21)

For

i = 0

, from (57), (A1) and the value range of the parameter

0 < γ \leq 1

, we have:

Q_{1} (s, a) = (1 - α) Q_{0} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{0} (s^{'}, a^{'})] \leq (1 - α) Q_{0} (s, a) + α [R (s, a) + \max_{a^{'}} Q_{0} (s^{'}, a^{'})] = (1 - α) {\bar{Q}}_{0} (s, a) + α [R (s, a) + \max_{a^{'}} {\bar{Q}}_{0} (s^{'}, a^{'})] = {\bar{Q}}_{1} (s, a)

(A22)

Thus, the statement (A21) holds for

i = 0

. Assume that the inequality (A21) holds for

i = l - 1

, i.e.,

Q_{l} (s, a) \leq {\bar{Q}}_{l} (s, a)

(A23)

It follows from (57) and (A23) that:

Q_{l + 1} (s, a) = (1 - α) Q_{l} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{l} (s^{'}, a^{'})] \leq (1 - α) Q_{l} (s, a) + α [R (s, a) + \max_{a^{'}} Q_{l} (s^{'}, a^{'})] \leq (1 - α) {\bar{Q}}_{l} (s, a) + α [R (s, a) + \max_{a^{'}} {\bar{Q}}_{l} (s^{'}, a^{'})] = {\bar{Q}}_{l + 1} (s, a)

(A24)

Thus, the statement (A21) holds for

i = l

. According to (A22) and (A24), for

i = 0, 1, 2, \dots

, the statement (A21) holds. The mathematical induction is complete.

From (A17), (A21) and (A16), there exists a limit of

Q_{i} (s, a)

. Let

\lim_{i \to \infty} Q_{i} (s, a) = Q_{\infty} (s, a)

(A25)

Finally, we prove the following statement by contradiction:

Q_{\infty} (s, a) = R (s, a) + γ \max_{a^{'}} Q_{\infty} (s^{'}, a^{'})

(A26)

Assume that (A26) is false. Define the function

ε (s, a) = R (s, a) + γ \max_{a^{'}} Q_{\infty} (s^{'}, a^{'}) - Q_{\infty} (s, a)

(A27)

Let

ϵ

be a small positive constant, such that the signs of

ε (s, a) + ϵ

and

ε (s, a) - ϵ

are the same. Considering (A25), there exists a positive integer

N

, such that the following inequality holds for

i > N

:

ε (s, a) + ϵ \geq R (s, a) + γ \max_{a^{'}} Q_{\infty} (s^{'}, a^{'}) - Q_{\infty} (s, a) \geq ε (s, a) - ϵ

(A28)

According to (57), the iterative Q function is written as:

Q_{1} (s, a) = Q_{0} (s, a) + α [R (s, a) + γ \max_{a^{'}} Q_{0} (s^{'}, a^{'}) - Q_{0} (s, a)]

(A29)

Q_{2} (s, a) = Q_{0} (s, a) + α \sum_{j = 0}^{1} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)]

(A30)

Q_{i + 1} (s, a) = Q_{0} (s, a) + α \sum_{j = 0}^{i} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)]

(A31)

Similarly,

Q_{\infty} (s, a)

can be expressed as:

Q_{\infty} (s, a) = Q_{0} (s, a) + α \sum_{j = 0}^{N - 1} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)] + α \sum_{j = N}^{\infty} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)]

(A32)

From (A28) and (A32), we obtain:

Q_{0} (s, a) + α \sum_{j = 0}^{N - 1} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)] + α \sum_{j = N}^{\infty} [ε (s, a) + ϵ] \geq Q_{\infty} (s, a) \geq Q_{0} (s, a) + α \sum_{j = 0}^{N - 1} [R (s, a) + γ \max_{a^{'}} Q_{j} (s^{'}, a^{'}) - Q_{j} (s, a)] + α \sum_{j = N}^{\infty} [ε (s, a) - ϵ] Q_{\infty} (s, a) \geq Q_{0} (s, a) + α \sum_{j = 0}^{N - 1} [R (s, a) + γ \max_{a^{'}} Q_{j} (s ’, a ’) - Q_{j} (s, a)] + α \sum_{j = N}^{\infty} [ε (s, a) - ϵ]

(A33)

According to (A33), if

ε (s, a) > 0

, then

Q_{\infty} (s, a) \geq \infty

. On the contrary, if

ε (s, a) < 0

, then

Q_{\infty} (s, a) \leq - \infty

. This is in contradiction with the fact that there exists a limit of

Q_{i} (s, a)

, as

i \to \infty

. Thus, the assumption is false, and the equation (A26) holds. Note that the formulation of (A26) is essentially the same as that of (63). □

References

Chen, K.; Zhou, J.; Shen, F.Q.; Sun, H.Y.; Fan, H. Hypersonic boost-glide vehicle strapdown inertial navigation system/global positioning system algorithm in a launch-centered earth-fixed frame. Aerosp. Sci. Technol. 2020, 98, 1–17. [Google Scholar] [CrossRef]
Hu, G.; Gao, B.; Zhong, Y.; Ni, L.; Gu, C. Robust unscented kalman filtering with measurement error detection for tightly coupled INS/GNSS integration in hypersonic vehicle navigation. IEEE Access 2019, 7, 151409–151421. [Google Scholar] [CrossRef]
Cheng, X.; Yang, Y.; Hao, Q. Analysis of the effects of thermal environment on optical systems for navigation guidance and control in supersonic aircraft based on empirical equations. Sensors 2016, 16, 1717. [Google Scholar] [CrossRef] [PubMed]
Kahn, A.D.; Edwards, D.J. Navigation, guidance, and control of a micro unmanned aerial glider. J. Guid. Control Dyn. 2019, 42, 2474–2484. [Google Scholar] [CrossRef]
Suna, J.; Bloma, H.A.P.; Ellerbroeka, J.; Hoekstraa, J.M. Particle filter for aircraft mass estimation and uncertainty modeling. Transp. Res. Part C 2019, 105, 145–162. [Google Scholar] [CrossRef]
Hiba, A.; Gáti, A.; Manecy, A. Optical navigation sensor for runway relative positioning of aircraft during final approach. Sensors 2021, 21, 2203. [Google Scholar] [CrossRef]
Hu, G.; Ni, L.; Gao, B.; Zhu, X.; Wang, W.; Zhong, Y. Model predictive based unscented Kalman filter for hypersonic vehicle navigation with INS/GNSS integration. IEEE Access 2020, 8, 4814–4823. [Google Scholar] [CrossRef]
Pan, W.; Zhan, X.; Zhang, X. Fault exclusion method for ARAIM based on tight GNSS/INS integration to achieve CAT-I approach. IET Radar Sonar Navig. 2019, 13, 1909–1917. [Google Scholar] [CrossRef]
Krasuski, K.; Cwiklak, J.; Jafernik, H. Aircraft positioning using PPP method in GLONASS system. Aircr. Eng. Aerosp. Technol. 2018, 90, 1413–1420. [Google Scholar] [CrossRef]
Jurevicius, R.; Marcinkevicius, V.; Šeibokas, J. Robust GNSS-denied localization for UAV using particle filter and visual odometry. Mach. Vis. Appl. 2019, 30, 1181–1190. [Google Scholar] [CrossRef] [Green Version]
Chen, D.; Neusypin, K.; Selezneva, M.; Mu, Z. New algorithms for autonomous inertial navigation systems correction with precession angle sensors in aircrafts. Sensors 2019, 19, 5016. [Google Scholar] [CrossRef]
Lei, X.; Liu, X. A high performance altitude navigation system for small rotorcraft unmanned aircraft. Mechatronics 2019, 62, 102248. [Google Scholar] [CrossRef]
Gui, M.; Zhao, D.; Ning, X.; Zhang, C.; Dai, M. A time delay/ star angle integrated navigation method based on the event-triggered implicit unscented Kalman filter. IEEE Trans. Instrum. Meas. 2021, 70, 1–10. [Google Scholar] [CrossRef]
Zhang, H.; Jiao, R.; Xu, L.; Xu, C.; Mi, P. Formation of a satellite navigation system using X-ray pulsars. Publ. Astron. Soc. Pac. 2019, 131, 045002. [Google Scholar]
Wang, T.; Somani, A.K. Aerial-DEM geolocalization for GPS-denied UAS navigation. Mach. Vis. Appl. 2020, 31, 3. [Google Scholar] [CrossRef]
Ye, L.; Yang, Y.; Ma, J.; Deng, L.; Li, H. Research on an LEO constellation multi-aircraft collaborative navigation algorithm based on a dual-way asynchronous precision communication-time service measurement system (DWAPC-TSM). Sensors 2022, 22, 3213. [Google Scholar] [CrossRef]
Wang, R.; Xiong, Z.; Liu, J.; Shi, L. A new tightly-coupled INS/CNS integrated navigation algorithm with weighted multi-stars observations. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2016, 230, 698–712. [Google Scholar] [CrossRef]
Ning, X.; Yuan, W.; Liu, Y. A tightly coupled rotational SINS/CNS integrated navigation method for aircraft. J. Syst. Eng. Electron. 2019, 30, 770–782. [Google Scholar]
Chen, H.; Gao, H.; Zhang, H. Integrated navigation approaches of vehicle aided by the strapdown celestial angles. Int. J. Adv. Robot. Syst. 2020, 5, 1–12. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Liu, L. Autonomous navigation for a group of satellites with star sensors and inter-satellite links. Acta Astronaut. 2013, 86, 10–23. [Google Scholar]
Huang, L.; Song, J.; Zhang, C.; Cai, G. Observable modes and absolute navigation capability for landmark-based IMU/Vision navigation system of UAV. Optik 2020, 202, 1–19. [Google Scholar] [CrossRef]
Kim, Y.; Bang, H. Vision-based navigation for unmanned aircraft using ground feature points and terrain elevation data. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2018, 232, 1334–1346. [Google Scholar] [CrossRef]
Christian, J.A. Optical navigation using planet’s centroid and apparent diameter in image. J. Guid. Control Dyn. 2015, 38, 192–204. [Google Scholar] [CrossRef]
Chang, J.; Geng, Y.; Guo, J.; Fan, W. Calibration of satellite autonomous navigation based on attitude sensor. J. Guid. Control Dyn. 2017, 40, 185–191. [Google Scholar] [CrossRef]
Lefferts, E.J.; Markley, F.L.; Shuster, M.D. Kalman filtering for spacecraft attitude estimation. J. Guid. Control Dyn. 1982, 5, 417–429. [Google Scholar] [CrossRef]
Markley, F.L. Attitude error representations for Kalman filtering. J. Guid. Control Dyn. 2003, 63, 311–317. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C. Integrated celestial navigation for spacecraft using interferometer and earth sensor. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2020, 234, 2248–2262. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: London, UK, 2018. [Google Scholar]
Gosavi, A. Reinforcement learning: A tutorial survey and recent advances. INFORMS J. Comput. 2009, 21, 178–192. [Google Scholar] [CrossRef]
Xu, X.; Zuo, L.; Huang, Z. Reinforcement learning algorithms with function approximation: Recent advance and applications. Inf. Sci. 2014, 261, 1–31. [Google Scholar] [CrossRef]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Wei, Q.; Lewis, F.L.; Sun, Q.; Yan, P.; Song, R. Discrete-time deterministic Q-learning: A novel convergence analysis. IEEE Trans. Cybern. 2017, 47, 1224–1237. [Google Scholar] [CrossRef]
Luo, B.; Wu, H.N.; Huang, T. Optimal output regulation for model-free quanser helicopter with multistep Q-learning. IEEE Trans. Ind. Electron. 2018, 65, 4953–4961. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended Kalman filter. Asian J. Control 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
Ding, L.; Yang, Q. Research on air combat maneuver decision of UAVs based on reinforcement learning. Avion. Technol. 2018, 49, 29–35. [Google Scholar]
Ristic, B.; Farina, A.; Benvenuti, D.; Arulampalam, M.S. Performance bounds and comparison of nonlinear filters for tracking a ballistic object on re-entry. IEE P-Radar Sonar Navig. 2003, 150, 65–70. [Google Scholar] [CrossRef]
Lei, M.; Wyk, B.J.; Qi, Y. Online estimation of the approximate posterior cramer-rao lower bound for discrete-time nonlinear filtering. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 37–57. [Google Scholar] [CrossRef]
Xiong, K.; Liu, L. Design of parallel adaptive extended Kalman filter for online estimation of noise covariance. Aircr. Eng. Aerosp. Technol. 2019, 91, 112–123. [Google Scholar] [CrossRef]

Figure 1. The measurement of the space target LOS vector.

Figure 2. Space-target-based INS/Vision integrated navigation method.

Figure 3. Relative positions of the Sun, space target and unmanned aircraft.

Figure 4. Earth’s shadow in the orbit of space target.

Figure 5. Relative positions of the Earth, aircraft and target.

Figure 6. Main components of the iterative learning procedure.

Figure 7. Satellite constellation in the simulation scenario.

Figure 8. Apparent magnitudes of the space targets.

Figure 9. Sun–Earth–target angles and the corresponding bounds.

Figure 10. Earth-aircraft-target angles and the corresponding bounds.

Figure 11. Sun-aircraft-target angles and the corresponding bounds.

Figure 12. CRLB for the position estimation with different accelerometer noise levels.

Figure 13. CRLB for the position estimation with different star camera noise levels.

Figure 14. Position error of space-target-based INS/Vision integrated navigation.

Figure 15. Velocity error of space-target-based INS/Vision integrated navigation.

Figure 16. Attitude error of space-target-based INS/Vision integrated navigation.

Figure 17. RMSE of the position estimation for EKF, AEKF and QLEKF.

Figure 18. RMSE of the attitude estimation for EKF, AEKF and QLEKF.

Table 1. Basic simulation configuration.

Initial condition	Initial position error	(200 m, 200 m, 200 m)
	Initial velocity error	(0.02 m/s, 0.02 m/s, 0.02 m/s)
	Initial attitude error	$(0.01 °$ $, 0.01 °$ $, 0.01 °$ )
	Duration of navigation process	2 h
Measurement performance	Accuracy of gyroscope	$0.02 °$ /hour
	Accuracy of accelerometer	100 μg
	Accuracy of star camera	5″
	Sensitivity of star camera	8 Mv
	Measurement update frequency	1 Hz
	Observed target in each period	1
	Total number of space target	180
QLEKF parameter	Learning rate	0.1
	Discounter factor	0.9
	Parameter of basis function $μ_{a d}$	(−2, −1, 0, 1, 2)
	Parameter of basis function $σ_{a d}$	0.3
	Tuning of filtering parameter	${\hat{Q}}_{P, k}^{(s, a)} = 10^{2 a} Q_{P, k}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Autonomous Navigation of Unmanned Aircraft Using Space Target LOS Measurements and QLEKF

Abstract

1. Introduction

2. System Model

2.1. Main Idea

2.2. Model for the Attitude Determination

2.3. Model for the Position Estimation

3. Principle of Visibility Analysis

4. Q-Learning Extended Kalman Filter

4.1. Q-Learning for the Filter Design

4.2. Filter Algorithm

5. Simulations

5.1. Simulation Conditions

5.2. CRLB Based on the Visibility Analysis

5.3. Simulation Results of the Navigation Filter

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics