You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

24 September 2024

Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter

,
and
1
Science and Technology on Space Intelligent Control Laboratory, Beijing Institute of Control Engineering, Beijing 100094, China
2
China Academy of Space Technology, Beijing 100094, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Navigation and Positioning

Abstract

For the relativistic navigation system where the position and velocity of the spacecraft are determined through the observation of the relativistic perturbations including stellar aberration and starlight gravitational deflection, a novel parallel Q-learning extended Kalman filter (PQEKF) is presented to implement the measurement bias calibration. The relativistic perturbations are extracted from the inter-star angle measurement achieved with a group of high-accuracy star sensors on the spacecraft. Inter-star angle measurement bias caused by the misalignment of the star sensors is one of the main error sources in the relativistic navigation system. In order to suppress the unfavorable effect of measurement bias on navigation performance, the PQEKF is developed to estimate the position and velocity, together with the calibration parameters, where the Q-learning approach is adopted to fine tune the process noise covariance matrix of the filter automatically. The high performance of the presented method is illustrated via numerical simulations in the scenario of medium Earth orbit (MEO) satellite navigation. The simulation results show that, for the considered MEO satellite and the presented PQEKF algorithm, in the case that the inter-star angle measurement accuracy is about 1 mas, after calibration, the positioning accuracy of the relativistic navigation system is less than 300 m.

1. Introduction

Spacecraft navigation is an enabling technology for a wide variety of space missions, such as Earth satellites and deep space explorers. Currently, the commonly used navigation approach is the radio navigation based on the radio signal sent from beacons, such as ground stations and the global navigation satellite system (GNSS) [1,2]. To reduce the mission cost and improve the autonomous survival capacity, the autonomous navigation system that determines the position and velocity of the spacecraft with onboard instruments in a radio signal-denied environment is required [3,4,5]. To achieve precise navigation information for the spacecraft without the support of man-made beacons is critical for the development of future intelligent unmanned systems [6,7].
In the past few decades, several autonomous navigation techniques with different observation information sources have been studied, such as optical navigation (OPNAV) using the optical imaging of nearby celestial bodies [8,9,10], X-ray pulsar-based navigation (XNAV) [11,12,13] and star navigation based on the Doppler effect of starlight (StarNAV-DE) [14,15,16]. In the on-orbit demonstrations, the positioning accuracy of the OPNAV based on the observations of Earth is on the order of a few kilometers, while the accuracy of XNAV is less than 10 km, which is not sufficient to satisfy the high-precision navigation requirement for certain space missions. The StarNAV-DE technique has been demonstrated on the Chinese Hα Solar Explorer (CHASE). It is reported that the accuracy of the solar velocimeter observing the starlight Doppler effect on CHASE is about 2 m/s.
The spacecraft autonomous navigation method based on the relativistic perturbations of starlight is introduced in [17] and developed in [18,19]. Recently, the investigation relevant to the relativistic navigation has attracted increasing attention. A practical mathematical model to describe the relativistic perturbations to the space-based starlight observation is derived in [20]. The optical instruments for the observation of the relativistic perturbations are discussed in [21]. The application of relativistic navigation is suggested in [22] for interstellar spacecraft with high velocity such that the relativistic perturbations are not negligible. To enhance navigation accuracy and rapidity, the information fusion scheme of the relativistic navigation and the OPNAV are designed in [23,24]. The extended Kalman filter (EKF) and the unscented Kalman filter (UKF) are designed and evaluated for the implementation of the relativistic navigation in [25,26].
Among the previously mentioned autonomous navigation techniques, the relativistic navigation based on the relativistic perturbations to the inter-star angle measurement has the potential to achieve higher performance with current technology. Generally, the relativistic navigation performance depends on the measurement accuracy of the inter-star angle and the precision of the star catalog. As the inter-star angle can be measured with the accuracy of a few mas with state-of-the-art instruments, and the error of the modern star catalog is less than 0.1 mas, it is considered that relativistic navigation is a promising method to achieve high performance. In comparison with the OPNAV, an advantage of relativistic navigation is that the high-accuracy observation of starlight is generally easier than that of a nearby celestial body. Compared with the XNAV, relativistic navigation is competitive, as the number of visible stars is much more than the X-ray pulsars suitable for navigation. In addition, as the stability of the inter-star angle calculated with the star catalog is rather high, the main difficulty of the StarNAV-DE technique due to the poor stability of the stellar spectra is avoided.
In the relativistic navigation system, at least two star sensors separated from each other by a large angle are required to measure the inter-star angles, which reveal the variations in the relativistic perturbations, including stellar aberration and starlight gravitational deflection. The inter-star angle measurement bias caused by the misalignment of the star sensors strongly affects the performance of the relativistic navigation system. The main motivation of this study is to calibrate the measurement bias accurately via a fine-designed navigation filter. A common approach is to model the measurement bias as calibration parameters, which can be estimated together with the position and velocity of the spacecraft through the EKF.
It is well known that the state estimation accuracy of the EKF depends on the tuning of the process and measurement noise covariance matrices [27,28]. As the measurement noise covariance matrix can be determined through the specification of the star sensors, the problem remains in determining the process noise covariance matrix, especially for the elements related to the calibration parameters. Generally, it is difficult to obtain the optimal noise covariance matrices in the absence of exact statistical knowledge about the process noise. Several attempts have been made in the literature to develop adaptive filters [29,30,31]. For the study of autonomous navigation, the most widely used method is the adaptive extended Kalman filter (AEKF), where the noise covariance matrices are estimated together with the state vector. However, it is often difficult to guarantee the estimation accuracy of the noise covariance matrix in the presence of the state estimation error. To cope with this problem, a potential method is to combine the Q-learning approach with the EKF for tuning the process noise covariance matrix automatically [32,33,34,35,36]. The key idea of the parallel Q-learning extended Kalman filter (PQEKF) is the integration of the EKF and the Q-learning approach, where the process noise covariance matrix of the EKF is selected with the Q-learning approach, whose reward is constructed with the innovation of the EKF, such that the appropriated covariance matrix is determined to improve the filtering accuracy.
This paper studies a measurement bias calibration method for a relativistic navigation system. The main contributions of this study are as follows: (1) The PQEKF is presented to adjust the process noise covariance matrix related to the calibration parameters and that related to the position and velocity vectors, respectively. The PQEKF is different from its original version presented in [24] in that two learning agents are designed to work in parallel such that the flexibility of the algorithm is improved. (2) It is illustrated that the PQEKF is effective for calibrating the inter-star angle measurement bias of the star sensors. The simulation shows that, after calibration, the relativistic navigation accuracy for the MEO satellite is on the order of 300 m in the case that the standard deviation of the measurement noise is about 1 mas. (3) The principle of the presented method can be further applied to cope with other state estimation problems that require autonomous parameters tuning.
The remaining part of the paper is organized as follows: Section 2 formulates the mathematical model of the relativistic navigation system. Section 3 presents the PQEKF algorithm for the relativistic navigation system to estimate the calibration parameters. Section 4 evaluates the performance of the navigation filter via simulations. Finally, Section 5 concludes the paper.

2. Relativistic Navigation System Model

2.1. Basic Principle of Relativistic Navigation

The concept of the relativistic navigation is illustrated in Figure 1.
Figure 1. Concept of relativistic navigation.
The basic principle is to estimate the position and velocity of the spacecraft based on the inter-star angle measurement, which is related to the relativistic perturbations including the stellar aberration and the starlight gravitational deflection. Considering that the effect of the stellar aberration and the starlight gravitational deflection to the inter-star angle measurement can be written as the function of the spacecraft velocity and position, respectively, the velocity and position information of the spacecraft can be extracted from the inter-star angle measurement. The derivation about the effect of relativistic perturbations to starlight is shown in [17]. As a basic of the navigation filter design, the observability analysis of the relativistic navigation system can be found in [24]. The spacecraft attitude determination method based on the stellar measurement is omitted here as it has been widely studied in literature, such as in [37].
As shown in Figure 2, a group of star sensors on a rigid platform are used to obtain the inter-star angle measurement. Then, the navigation filter is implemented to incorporate the orbital dynamics and the inter-star angle measurement, such that the position and velocity of the spacecraft are estimated. As the inter-star angle measurement bias caused by the misalignment of various star sensors may seriously degrade the navigation performance, the calibration method is required to estimate and compensate for the measurement bias. With accurate star sensors, a precise star catalog and a high-fidelity orbital dynamics model, high navigation performance is achievable if the calibration method is designed appropriately.
Figure 2. Diagram of relativistic navigation and calibration method.

2.2. Dynamic Model

The dynamic model and the measurement model of the relativistic navigation system are constructed for the design of the navigation filter. The state vector is composed of the position vector r k = r x k r y k r z k T , the velocity vector v k = v x k v y k v z k T and the calibration parameter vector b k = [ b i j k ] T , which is shown as
x k = r k T v k T b k T T
where the sub-label k denotes the discrete time, and i and j are used to distinguish different stars. The position and velocity vectors are defined in the Earth-centered inertial coordinate system.
For the Earth satellite, the dynamic model to describe the time evolution of the state vector is written as
x k = f x k 1 + w k
with
f x k = x k + ϕ ( x k ) τ
ϕ x k = v k μ E r k r k 3 + p ( r k ) 0 m × 1
where τ is the step size, μ E is the gravitational constant of Earth and · denotes the Euclidean norm of a vector. The function p ( r k ) denotes the accelerations of the spacecraft generated by perturbations other than the central gravity of Earth, such as the non-spherical gravity of Earth, atmospheric drag, sunlight pressure and lunar and solar gravity. The expression of the function p ( r k ) can be found in [1]. The un-modeled perturbations are considered to be relatively small and can be absorbed in the process noise w k , which is assumed as a zero-mean noise with the covariance matrix Q k . Q k as a symmetric and positive definite matrix.

2.3. Measurement Model

Considering the stellar aberration, the starlight gravitational deflection and the measurement bias, when multiple stars are observed simultaneously, the measurement model to describe the relation between the inter-star angle measurement and the state vector is formulated as
y k = h x k + ν k
with
y k = y i j k ,   h x k = h i j ( x k )
where y k is the measurement vector, ν k is the measurement noise with a covariance matrix R k , R k is a symmetric and positive definite matrix and y i j k is the measurement of the angle between two stars distinguished by i and j . The elements h i j ( x k ) in the function vector h x k are expressed as
h i j x k = u I i k T u I j k + 1 c 1 u I i k T u I j k v k + v E , k T u I i k + ( v k + v E , k ) T u I j k       1   c 2 1 u I i k T u I j k [ ( v k + v E , k ) T u I i k 2 + ( v k + v E , k ) T u I j k 2 + ( v k + v E , k ) T u I i k ( v k + v E , k ) T u I j k v k + v E , k T ( v k + v E , k ) ] + b i j k
where c is the speed of light, v E , k is the velocity vector of Earth’s center relative to the solar system barycenter (SSB) and u I i k is the line-of-sight (LOS) vector of the i th star as seen by a fictitious stationary observer. The expression of u I i k is given by
u I i k = u I i k + δ u I i k
where u I i k is the unit LOS vector of the i th star in the absence of the gravitational field, which is calculated from the star catalog. Δ u I i k denotes the effect of Earth’s gravitational field to the unit LOS vector of the i th star, which is described as
δ u I i k = 2 μ E c 2 1 u I i k T r k / r k ( I 3 × 3 u I i k u I i k T ) r k ( I 3 × 3 u I i k u I i k T ) r k 2
Note that the measurement bias b i j k is modeled as the calibration parameter. The derivation of the Jacobian matrix H k for the measurement model shown in (5) is similar to that in [24].

4. Simulations

4.1. Simulation Conditions

In this section, comparisons are performed to demonstrate the efficiency of the calibration method for the relativistic navigation system using the PQEKF. The reference trajectory of the spacecraft is generated through a high precision orbit propagator, where non-spherical Earth gravity perturbation, lunar–solar gravitational perturbation and solar radiation pressure perturbation are considered. Assume that the spacecraft is an MEO satellite in a near-circular orbit with a semi-major axis of 21,528 km and inclination of 55°. The measurement data are generated according to the reference trajectory and the measurement model shown in Section 2. The navigation filters designed based on the EKF and the PQEKF presented in Section 3 are implemented individually to process the measurement data. The position and velocity estimation errors are obtained via comparison between the state estimation and the reference trajectory.
For the fairness of comparison, the EKF and the PQEKF share the same measurement noise covariance matrix R k and the initial estimation error covariance matrix P 0 . The parameter settings for the simulation are listed in Table 1.
Table 1. Simulation parameter settings.
For the PQEKF, when discretizing the state space, since the range of the state space is unknown, the upper limit and lower limit of the process noise covariance matrix is obtained through experiments. The performance of the presented methods is evaluated via the position and velocity estimation errors, which are critical for the orbital control of the spacecraft.

4.2. Simulation Results

First, the navigation performance of the presented method is compared with that of the traditional EKF without measurement bias calibration [24]. The three-axis position and velocity estimation error curves of the spacecraft obtained from the EKF without measurement bias calibration are shown in Figure 4 and Figure 5 with solid line, where the dashed lines represent the theoretic error bounds calculated from the estimation error covariance matrix of the navigation filter.
Figure 4. Position estimation error of traditional EKF without measurement bias calibration.
Figure 5. Velocity estimation error of traditional EKF without measurement bias calibration.
It is seen from Figure 4 and Figure 5 that the error curves fluctuate out of the theoretic error bounds frequently due to the unfavorable effect of the measurement bias. In contrast, the position and velocity estimation error curves of the calibration method based on the PQEKF are shown in Figure 6 and Figure 7.
Figure 6. Position estimation error of calibration method based on PQEKF.
Figure 7. Velocity estimation error of calibration method based on PQEKF.
From Figure 6 and Figure 7, it can be seen that all of the error curves are contained in the corresponding error bounds, which indicates the effectiveness for the design of the navigation filter.
Second, to facilitate the performance comparison of the algorithms in different simulation conditions, the position and velocity average root mean squared (RMS) errors of the EKF without bias calibration, the EKF with bias calibration and the presented method are plotted versus different settings of the measurement bias in Figure 8 and Figure 9.
Figure 8. Position RMS errors of different methods vs. measurement bias.
Figure 9. Velocity RMS errors of different methods vs. measurement bias.
It is easy to see from Figure 8 and Figure 9 that the estimation error of the EKF without bias calibration is enlarged with the increase in the measurement bias, while the effect of the measurement bias on the navigation performance is suppressed efficiently when the EKF with bias calibration is adopted. In addition, the calibration method based on the PQEKF achieves superior performance due to its ability in selecting the suitable process noise covariance matrix. It indicates that the presented method is not sensitive to the inter-star angle measurement bias.
In addition, the effect of the measurement noise on the PQEKF algorithm is examined through simulations. When the standard deviation of the measurement noise is changed in the scopes of [0.6, 1.6] mas, the RMS position errors of the EKF and the PQEKF are illustrated in Figure 10. The simulation result shows that, in comparison with the EKF, the PQEKF is more effective for suppressing the unfavorable effect of the measurement noise.
Figure 10. RMS errors as functions of measurement noise standard deviation.
It is seen from Algorithm 2 that the PQEKF contains multiple EKFs. In the simulation, the execution time of the PQEKF is several times larger than that of the EKF. Nevertheless, it is easy to complete the one step iteration of the PQEKF algorithm in an observation period of the star sensors. For the considered system, to reduce the computational cost of the PQEKF, the state space or the action space of the Q-learning approach could be further optimized. For a complicated practical system with a large state space or action space, artificial neural network approximation or dedicated hardware can be introduced for the implementation of the algorithm. In addition, it is expected that a dynamic state space with the bound stretched automatically can be designed in future works.
Next, as the PQEKF is an improved version of the QLEKF, it is compared with the QLEKF for the relativistic navigation system through Monte Carlo trials. The position RMS error curves of the calibration methods based on the EKF, the AEKF, the QLEKF and the PQEKF are plotted in Figure 11. The statistical values of the navigation accuracy for the different methods are summarized in Table 2.
Figure 11. Position RMS error curves of different navigation filters.
Table 2. Comparison of calibration methods based on EKF, AEKF, QLEKF and PQEKF.
We can see from Figure 11 and Table 2 that the navigation performance of the PQEKF is slightly higher than the QLEKF, as two learning agents are implemented in parallel to select the appropriate Q r v ( s r v , m i n ) and Q b ( s b , m i n ) in parallel, while the QLEKF is designed to search for the whole Q k . It is believed that the design of the PQEKF is more flexible than the QLEKF as different scale factors can be adopted to tune the different sub-matrices in the process noise covariance matrix.
Finally, the influence of the state space discretization on the positioning accuracy of the PQEKF algorithm for the relativistic navigation system is analyzed. When the number of states is set as five, seven and nine, respectively, the position RMS error curves of the PQEKF algorithms are shown in Figure 12.
Figure 12. Position RMS error curves of PQEKF algorithms for different state numbers.
From Figure 12, the variation in the position estimation error under the different settings of the state number in the state space discretization is rather small in the majority of the simulation processes. This indicates that the influence of the state number variation within a certain scope on the estimation accuracy of the PQEKF is not evident. In the considered scenario, two pre-determined sets with a small number of design values for Q r v and Q b are beneficial for improving the performance of the calibration method.
According to the above simulation analysis, it is confirmed that the presented method is well-suited for the relativistic navigation system with the requirement to calibrate the inter-star angle measurement bias. For the simulation conditions described in Section 4.1, the achievable spacecraft navigation accuracy is on the order of a few hundred meters, which is sufficient for most orbital control missions.

5. Conclusions

This paper presents an inter-star angle measurement bias calibration method for the spacecraft relativistic navigation system. The proper design of the process noise covariance matrix is critical for accurate calibration. In order to improve the calibration accuracy and the navigation performance, the Q-learning approach is combined with the EKF for an online adaptive tuning of the process noise covariance matrix based on the measurement data achieved from onboard star sensors. The PQEKF algorithm is developed as the navigation filter, where two learning agents are implemented in parallel to select the appropriate sub-matrices related to the kinematic state and the calibration parameters, respectively. The simulation results show that the navigation performance of the presented method is superior to that of the EKF, the AEKF and the QLEKF in the presence of measurement bias, demonstrating the efficiency of the calibration method and the PQEKF algorithm. This study introduces a hybrid framework to combine the reinforcement learning approach in the navigation filter, which can serve as a foundation method to improve the state estimation accuracy in potential applications of relativistic navigation for Earth satellites or deep space explorers.

Author Contributions

Conceptualization, K.X. and Q.Z.; methodology, K.X.; software, Q.Z.; validation, K.X., Q.Z. and L.Y.; formal analysis, K.X.; writing—original draft preparation, K.X.; writing—review and editing, Q.Z.; supervision, L.Y.; project administration, K.X.; funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62394354.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 1.
First, mathematical induction is adopted to prove the following inequality
Q i s , a Q i 1 s , a .
For i = 1 , from (10), we have
Q 1 s , a = U s , a + γ Q 0 s , a 0 ( s ) .
Considering the condition shown in (12), we obtain
Q 1 s , a Q 0 s , a .
It is seen from (A3) that the inequality (A1) holds for i = 1 . Assume that the inequality (A1) holds for i = l , i.e.,
Q l s , a Q l 1 s , a .
It can be derived from (10) and (11) that
Q l s , a = U s , a + γ Q l 1 s , a l 1 ( s ) U s , a + γ Q l s , a l 1 ( s ) U s , a + γ   min a A Q l s , a = U s , a + γ Q l s , a l ( s ) = Q l + 1 s , a .
From (A5), the inequality (A1) holds for i = l + 1 . Thus, for i = 1,2 , , the inequality (A1) holds. The mathematical induction is complete.
Second, considering that Q i s , a is a bounded non-increasing sequence, let
Q s , a = lim i Q i s , a
a s = a r g   min a A Q s , a .
Taking the limit of both sides of equation (10) yields
Q s , a = U s , a + γ Q s , a ( s ) .
The formulation of (A8) is essentially the same as that of (15). This completes the proof of Theorem 1. □

Appendix B

Proof of Lemma 1.
Mathematical induction is adopted to prove Lemma 1. For i = 1 , let
a 1 ( 0 ) s = a r g   min a 1 A Q 1 ( 0 ) s , a 1
and
a 2 ( 0 ) s = a r g   min a 2 A Q 2 ( 0 ) s , a 2 .
Considering the condition shown in (19), we have
Q 1 ( 0 ) s , a 1 ( 0 ) s Q 1 ( 0 ) s , a 2 ( 0 ) s Q 2 ( 0 ) s , a 2 ( 0 ) s
It is derived from (10) and (A11) that
Q 1 ( 1 ) s , a = U s , a + γ Q 1 ( 0 ) s , a 1 ( 0 ) s U s , a + γ Q 2 ( 0 ) s , a 2 ( 0 ) s = Q 2 ( 1 ) s , a .
It is seen from (A12) that the inequality (20) holds for i = 1 . Assume that the inequality (20) holds for i = l , i.e.,
Q 1 ( l ) s , a Q 2 ( l ) s , a .
Let
a 1 ( l ) s = a r g   min a 1 A Q 1 ( l ) s , a 1
and
a 2 ( l ) s = a r g   min a 2 A Q 2 ( l ) s , a 2 .
From (A13), we obtain
Q 1 ( l ) s , a 1 ( l ) s Q 1 ( l ) s , a 2 ( l ) s Q 2 ( l ) s , a 2 ( l ) s .
It is derived from (10) and (A16) that
Q 1 ( l + 1 ) s , a = U s , a + γ Q 1 ( l ) s , a 1 ( l ) s U s , a + γ Q 2 l s , a 2 l s = Q 2 ( l + 1 ) s , a
From (A17), the inequality (20) holds for i = l + 1 . Thus, for i = 1,2 , , the inequality (20) holds. This completes the proof of Lemma 1. □

Appendix C

Proof of Lemma 2.
Mathematical induction is adopted to prove Lemma 2. For t = 1 , it follows from (21) and (22) that
( L 1 Q ¯ i ) s , a = U s , a + γ   min a A Q ¯ i s , a = U s , a + γ   min a A Q i s , a + γ ϵ .
From (17), we have
Q i + 1 s , a = U s , a + γ   min a A Q i s , a .
Inserting (A19) into (A18), we obtain
( L 1 Q ¯ i ) s , a = Q i + 1 s , a + γ ϵ .
It is seen from (A20) that the equality (23) holds for t = 1 .
Assume that the inequality (23) holds for t = l , i.e.,
( L l Q ¯ i ) s , a = Q i + l s , a + γ l ϵ .
It follows from (22) and (A21) that
( L l + 1 Q ¯ i ) s , a = U s , a + γ   min a A ( L l Q ¯ i ) s , a = U s , a + γ   min a A Q i + l s , a + γ l + 1 ϵ .
Considering that
Q i + l + 1 s , a = U s , a + γ   min a A Q i + l s , a
The Equation (A22) becomes
L l + 1 Q ¯ i s , a = Q i + l + 1 s , a + γ l + 1 ϵ .
From (A24), the inequality (23) holds for t = l + 1 . Thus, for i = 1,2 , , the inequality (23) holds. This completes the proof of Lemma 2. □

Appendix D

Proof of Theorem 2.
It is easy to see from (25) that
ϵ ¯ i Q i s , a Q i 1 s , a
or
Q i s , a Q i 1 s , a + ϵ ¯ i .
According to Lemma 1, we get the following inequality
Q i + 1 s , a L 1 Q i 1 + ϵ ¯ i s , a .
According to Lemma 2, the right side of (A27) is written as
L 1 Q i 1 + ϵ ¯ i s , a = Q i s , a + γ ϵ ¯ i .
Substituting (A28) into (A27) yields
Q i + 1 s , a Q i s , a + γ ϵ ¯ i .
Furthermore, it is derived according to Lemmas 1 and 2 that
Q i + 2 s , a L 1 Q i + γ ϵ ¯ i s , a
and
L 1 Q i + ϵ ¯ i s , a = Q i + 1 s , a + γ 2 ϵ ¯ i Q i s , a + γ ϵ ¯ i + γ 2 ϵ ¯ i .
Substituting (A31) into (A30), we have
Q i + 2 s , a Q i s , a + ( γ + γ 2 ) ϵ ¯ i .
With a similar process, it is easy to verify that
Q i + t s , a Q i s , a + ( j = 1 t γ j ) ϵ ¯ i .
In the case t , the inequality (A33) is expressed as
lim t Q i + t s , a Q i s , a + γ 1 γ ϵ ¯ i .
According to Theorem 1, we obtain
Q * s , a Q i s , a + γ 1 γ ϵ ¯ i
Similarly, applying Lemmas 1 and 2, we get the following inequality from (26)
Q * s , a Q i s , a + γ 1 γ ϵ _ i
Combining the inequalities (A35) and (A36), we conclude that the inequality (24) holds. This completes the proof of Theorem 2.  □

Appendix E

Proof of Theorem 3.
According to Theorem 2, we have
Q i s , a + γ 1 γ min s S , a A [ Q i s , a Q i 1 s , a ] Q * s , a .
From (27), we obtain
Q i s , a Q ^ i s , a ε i ( s , a )
and
Q i 1 s , a Q ^ i 1 s , a + ε i 1 ( s , a ) .
Substituting (A38) and (A39) into (A37) yields
                  Q ^ i s , a ε i s , a + γ 1 γ min s S , a A Q ^ i s , a ε i s , a Q ^ i 1 s , a ε i 1 s , a Q * s , a
From (31), the inequality (A40) is rewritten as
Q ^ i s , a ε ¯ i + γ 1 γ min s S , a A [ Q ^ i s , a ε ¯ i Q ^ i 1 s , a ε ¯ i ] Q * s , a
or
Q ^ i s , a c _ i Q * s , a .
It is derived with a similar process that
Q ^ i s , a + c ¯ i Q * s , a
Combining the inequalities (A42) and (A43), we conclude that the inequality (28) holds. This completes the proof of Theorem 3.  □

References

  1. Huang, J.; Yang, R.; Zhan, X. Constraint Navigation Filter for Space Vehicle Autonomous Positioning with Deficient GNSS Measurements. Aerosp. Sci. Technol. 2022, 120, 107291. [Google Scholar] [CrossRef]
  2. Ely, T.A.; Seubert, J.; Bradley, N.; Drain, T.; Bhaskaran, S. Radiometric Autonomous Navigation Fused with Optical for Deep Space Exploration. J. Astronaut. Sci. 2021, 68, 300–325. [Google Scholar] [CrossRef]
  3. Gallo, E.; Barrientos, A. Reduction of GNSS-Denied Inertial Navigation Errors for Fixed Wing Autonomous Unmanned Air Vehicles. Aerosp. Sci. Technol. 2022, 120, 107237. [Google Scholar] [CrossRef]
  4. Hu, J.; Liu, J.; Wang, Y.; Ning, X. INS/CNS/DNS/XNAV Deep Integrated Navigation in a Highly Dynamic Environment. Aircr. Eng. Aerosp. Technol. 2023, 95, 180–189. [Google Scholar] [CrossRef]
  5. Yang, Y.; Han, X.; Song, N.; Wang, Z. A New Method to Improve the Measurement Accuracy of Autonomous Astronomical Navigation. J. Math. 2022, 2022, 3649662. [Google Scholar] [CrossRef]
  6. Wang, Y.; Yan, T.; Wang, L. Development Situation and Trend of Space Intelligent Navigation Technology. Aerosp. Control Appl. 2022, 48, 9–17. [Google Scholar]
  7. Zhou, B.; Li, Y.; Zhang, A.; Cui, S. Observability Analysis of Satellite Autonomous Orbit Determination with Modeling and Measurement Errors. Chin. Space Sci. Technol. 2023, 43, 25–34. [Google Scholar]
  8. Christian, J.A. Optical Navigation Using Planet’s Centroid and Apparent Diameter in Image. J. Guid. Control. Dyn. 2015, 38, 192–204. [Google Scholar] [CrossRef]
  9. Hou, B.; Wang, J.; Zhou, H.; He, Z.; Li, D.; Liu, X. Guidepost-based Autonomous Orbit Determination Method for GEO Satellite. Adv. Space Res. 2021, 67, 1090–1113. [Google Scholar] [CrossRef]
  10. Turan, E.; Speretta, S.; Gill, E. Autonomous navigation for deep space small satellites: Scientific and technological advances. Acta Astronaut. 2022, 193, 56–74. [Google Scholar] [CrossRef]
  11. Sheikh, S.I.; Pines, D.J. Spacecraft Navigation Using X-Ray Pulsars. J. Guid. Control. Dyn. 2006, 29, 49–63. [Google Scholar] [CrossRef]
  12. Wang, Y.; Zheng, W.; Ge, M.; Zheng, S.; Zhang, S. Use of Statistical Linearization for Nonlinear Least-Squares Problems in Pulsar Navigation. J. Guid. Control. Dyn. 2023, 46, 1850–1855. [Google Scholar] [CrossRef]
  13. Zoccarato, P.; Larese, S.; Naletto, G.; Zampieri, L.; Brotto, F. Deep Space Navigation by Optical Pulsars. J. Guid. Control. Dyn. 2023, 46, 1501–1511. [Google Scholar] [CrossRef]
  14. Zhang, W. A Study of the Navigation Technology and Application Based on Astronomical Spectral Velocity Measurement. Navig. Control 2020, 19, 64–73. [Google Scholar]
  15. Liu, J.; Wang, T.; Ning, X.; Kang, Z. Modelling and analysis of celestial Doppler difference velocimetry navigation considering solar characteristics. IET Radar Sonar Navig. 2020, 14, 1897–1904. [Google Scholar] [CrossRef]
  16. Gui, M.; Yang, H.; Ning, X.; Ye, W.; Wei, C. A Novel Sun Direction/Solar Disk Velocity Difference Integrated Navigation Method Against Installation Error of Spectrometer Array. IEEE Sens. J. 2023, 23, 17480–17490. [Google Scholar] [CrossRef]
  17. Christian, J.A. StarNAV: Autonomous Optical Navigation of a Spacecraft by the Relativistic Perturbation of Starlight. Sensors 2019, 19, 4064. [Google Scholar] [CrossRef]
  18. Bailer-Jones, C.A.L. Lost in Space? Relativistic Interstellar Navigation using an Astrometric Star Catalog. Publ. Astron. Soc. Pac. 2021, 133, 074502. [Google Scholar] [CrossRef]
  19. McKee, P.; Kowalski, J.; Christian, J. Navigation and star identification for an interstellar mission. Acta Astronaut. 2022, 192, 390–401. [Google Scholar] [CrossRef]
  20. Klioner, S. A Practical Relativistic Model for Microarcsecond Astrometry in Space. Astron. J. 2003, 125, 1580–1597. [Google Scholar] [CrossRef]
  21. McKee, P.; Nguyen, H.; Kudenov, M.W.; Christian, J.A. StarNAV with a wide field-of-view optical sensor. Acta Astron. 2022, 197, 220–234. [Google Scholar] [CrossRef]
  22. Yucalan, D.; Peck, M. Autonomous Navigation of Relativistic Spacecraft in Interstellar Space. J. Guid. Control Dyn. 2021, 44, 1106–1115. [Google Scholar] [CrossRef]
  23. Xiong, K.; Wei, C. Integrated Celestial Navigation for Spacecraft Using Interferometer and Earth Sensor. Proc. Inst. Mech. Eng. Part G: J. Aerosp. Eng. 2020, 234, 2248–2262. [Google Scholar] [CrossRef]
  24. Xiong, K.; Wei, C.; Zhou, P. Integrated Autonomous Optical Navigation Using Q-Learning Extended Kalman Filter. Aircr. Eng. Aerosp. Technol. 2022, 94, 848–861. [Google Scholar] [CrossRef]
  25. Gui, M.; Wei, Y.; Ning, X. Celestial angle measurement navigation for Mars probe considering relativistic effect. J. Deep Space Explor. 2023, 10, 126–132. [Google Scholar]
  26. Liu, F.; Li, M.; Peng, Y.; Sun, J.; Liu, J. An autonomous navigation method for spacecraft in cislunar space using stellar aberration observation. J. Deep Space Explor. 2023, 10, 159–168. [Google Scholar]
  27. Ullah, I.; Fayaz, M.; Naveed, N.; Kim, D. ANN Based Learning to Kalman Filter Algorithm for Indoor Environment Prediction in Smart Greenhouse. IEEE Access 2020, 8, 159371–159388. [Google Scholar] [CrossRef]
  28. Or, B.; Klein, I. A Hybrid Model and Learning-Based Adaptive Navigation Filter. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
  29. Ning, X.; Li, Z.; Wu, W.; Yang, Y.; Fang, J.; Liu, G. Recursive Adaptive Filter Using Current Innovation for Celestial Navigation During the Mars Approach Phase. Sci. China-Inf. Sci. 2017, 60, 032205. [Google Scholar] [CrossRef]
  30. Li, W.; Sun, S.; Jia, Y.; Du, J. Robust unscented Kalman filter with adaptation of process and measurement noise covariances. Digit. Signal Process. 2016, 48, 93–103. [Google Scholar] [CrossRef]
  31. Jia, W.; Tian, Y.; Duan, H.; Luo, R.; Lian, J.; Ruan, C.; Zhao, D.; Li, C. Autonomous Navigation Control Based on Improved Adaptive Filtering for Agricultural Robot. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420925357. [Google Scholar] [CrossRef]
  32. Xiong, K.; Zhou, P.; Wei, C. Autonomous Navigation of Unmanned Aircraft Using Space Target LOS Measurements and QLEKF. Sensors 2022, 22, 6992. [Google Scholar] [CrossRef] [PubMed]
  33. Tao, W.; Zhang, J.; Hu, H.; Zhang, J.; Sun, H.; Zeng, Z.; Song, J.; Wang, J. Intelligent Navigation for the Cruise Phase of Solar System Boundary Exploration Based on Q-learning EKF. Complex Intell. Syst. 2024, 2, 2653–2672. [Google Scholar] [CrossRef]
  34. Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended Kalman filter. Asian J. Control. 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
  35. Chen, C.; Wu, X.; Bo, Y.; Chen, Y.; Liu, Y.; Alsaadi, F.E. SARSA in extended Kalman Filter for complex urban environments positioning. Int. J. Syst. Sci. 2021, 52, 3044–3059. [Google Scholar] [CrossRef]
  36. Yin, Y.; Li, S.E.; Tang, K.; Cao, W.; Wu, W.; Li, H. Approximate optimal filter design for vehicle system through Actor-Critic reinforcement learning. Automot. Innov. 2022, 5, 415–426. [Google Scholar] [CrossRef]
  37. Crassidis, J.L.; Markley, F.L.; Cheng, Y. Survey of Nonlinear Attitude Estimation Methods. J. Guid. Control. Dyn. 2007, 30, 12–28. [Google Scholar] [CrossRef]
  38. Hu, Z.; Gong, W. Constrained Evolutionary Optimization Based on Reinforcement Learning Using the Objective Function and Constraints. Knowl.-Based Syst. 2022, 237, 107731. [Google Scholar] [CrossRef]
  39. Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-learning Algorithms: A Comprehensive Classification and Applications. IEEE Access 2019, 7, 133653–133667. [Google Scholar] [CrossRef]
  40. Li, Y.; Yang, C.; Hou, Z.; Feng, Y.; Yin, C. Data-driven approximate Q-learning stabilization with optimality error bound analysis. Automatica 2019, 103, 435–442. [Google Scholar] [CrossRef]
  41. Shi, H.; Li, X.; Hwang, K.; Pan, W.; Xu, G. Decoupled Visual Servoing with Fuzzy Q-learning. IEEE Trans. Ind. Inform. 2018, 14, 241–252. [Google Scholar] [CrossRef]
  42. Wu, G. UAV-Based Interference Source Localization: A Multi-model Q-learning Approach. IEEE Access 2019, 7, 137982–137991. [Google Scholar] [CrossRef]
  43. Maia, R.; Mendes, J.; Araujo, R.; Silva, M.; Nunes, U. Regenerative Braking System Modeling by Fuzzy Q-Learning. Eng. Appl. Artif. Intell. 2020, 93, 103712. [Google Scholar] [CrossRef]
  44. Wei, Q.; Lewis, F.L.; Sun, Q.; Yan, P.; Song, R. Discrete-time Deterministic Q-learning: A Novel Convergence Analysis. IEEE Trans. Cybern. 2017, 47, 1224–1237. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.