Inter-Spacecraft Rapid Transfer Alignment Based on Attitude Plus Angular Rate Matching Using Q-Learning Kalman Filter

Kai Xiong; Peng Zhou; Xiangyu Huang

doi:10.3390/s25092774

,

and

Science and Technology on Space Intelligent Control Laboratory, Beijing Institute of Control Engineering, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Sensors2025, 25(9), 2774;https://doi.org/10.3390/s25092774

This article belongs to the Section Physical Sensors

Version Notes

Order Reprints

Review Reports

Abstract

This study focuses on the transfer alignment issue between a master spacecraft and a slave spacecraft for the scenario in which the slave spacecraft is mounted on the master satellite before release and should be ready to depart and perform its space mission independently. The challenge of the transfer alignment is to estimate the attitude and calibration parameters of the gyroscope unit (GU) on the slave spacecraft based on the attitude determination system (ADS) of the master spacecraft. To improve the accuracy and rapidity of the transfer alignment, a novel attitude plus angular rate matching scheme is presented using fused sensor information on the master spacecraft. Accordingly, a fifteen-dimensional state-space model is derived to estimate the spacecraft attitude, the GU bias, scale factor error and misalignment simultaneously. A Q-learning Kalman filter (QKF) is designed to fine tune the process noise covariance matrix related to the calibration parameters, which benefits the state estimation performance. The simulation results show that the presented attitude plus angular rate matching scheme performs better than the traditional attitude matching scheme, and the QKF outperforms the standard Kalman filter (KF) and the adaptive Kalman filter (AKF).

Keywords:

spacecraft; attitude determination; transfer alignment; Q-learning; Kalman filter

1. Introduction

Recently, there have been many space systems that contain the master spacecraft and the slave spacecraft, such as CubeSats deployed from the space station [1], the lander or rover released from a deep-space probe on the orbit of a planet [2] and the smart impactor launched from a mother flyby spacecraft [3]. Prior to the release of the slave spacecraft, its attitude has to be initialized and the systematic errors of its gyroscope unit (GU) should be calibrated, such that the slave spacecraft can perform its space mission independently. The attitude determination accuracy of the slave spacecraft depends on the calibration accuracy of the systematic errors, such as gyroscope bias, scale factor error and misalignment [4,5,6]. Although the ground calibration method is convenient to achieve high accuracy [7,8], the values of the calibration parameters may change due to the difference between the ground and space environments [9]. The primary cause for the change in the misalignment, which is the most dominant error that affects the attitude determination accuracy, is the thermal distortion of the GU bracket and the spacecraft body, which yields a pointing change in the GU axis in the space environment. To cope with this problem, the attitude determination system (ADS) of the master spacecraft can be utilized as the reference for the on-orbit calibration [10]. Typically, the high-precision gyroscopes and the star sensors are mounted on the master spacecraft for attitude determination [11,12,13]. An accurate and rapid transfer alignment is crucial to guarantee the attitude determination performance of the slave spacecraft [14,15].

The main problem of the traditional inter-spacecraft transfer alignment method is that the convergence rate of the algorithm is rather slow. Typically, tens of minutes are required for the traditional transfer alignment process [16], which is not satisfactory for quick response situations. To achieve faster convergence and increased accuracy, the rapid transfer alignment techniques are investigated in this paper.

The measurement information matching scheme is an important research object in the field of transfer alignment. Many scholars focus on the on-orbit calibration method for the GU based on the attitude matching scheme [17,18]. The traditional method is to estimate the attitude and the calibration parameters of the GU on the slave spacecraft using the attitude measurement data provided by the star sensors on the master spacecraft with a Kalman filter (KF) [19,20]. The high-accuracy measurement information obtained from the satellite payload and GNSS (global navigation satellite system) are taken into account for the calibration [21,22]. In addition, for tactical weapon applications, the velocity plus attitude matching scheme [23,24] is developed as an improvement of the conventional velocity matching scheme [25,26,27]. The calibration methods based on position matching and acceleration matching schemes are studied in [28,29].

On the basis of measurement information matching, the performance of the inter-spacecraft transfer alignment method depends on the filtering algorithm. The KF is the most widely used state estimation approach for the space missions [30,31,32]. The estimation accuracy of the KF depends on the process and measurement noise covariance matrices

Q_{k}

and

R_{k}

in the system model. If the prior statistical characteristics of the process and measurement noises are inaccurate, the performance of the KF may be degraded evidently. For the inter-spacecraft transfer alignment system, the statistical characteristics of the gyroscopes and the star sensors can be achieved from the manufacturer, while the process noise covariance related to the calibration parameters are often not known exactly. To cope with the problem, the common approach is to estimate the unknown noise covariance matrix with an adaptive Kalman filter (AKF) [33,34]. In the AKF, the estimate of

Q_{k}

is updated recursively based on the measurement innovation, which is calculated with the previous state estimate in each iteration. However, when the previous state estimate is inaccurate, and the prior statistical information is far from the actual situation, it is difficult to obtain the proper

Q_{k}

with the AKF [35]. To the best of the authors’ knowledge, the optimal approach to tune the process noise covariance matrix is still an open question.

Motivated by the key idea of the transfer alignment method for tactical weapon applications and the AKF algorithm, to improve the accuracy and rapidity of the inter-spacecraft transfer alignment, a practical method is presented in this paper. The main contributions of the paper are given as follows:

(1): An attitude plus angular rate matching scheme is presented, where the fused information from the star sensors and the gyroscopes on the master spacecraft is adopted for the calibration of the GU on the slave spacecraft. Accordingly, the state equation and the measurement equation are derived as the transfer alignment system model. Compared with the traditional attitude matching scheme, the main advantage of the presented scheme is that more measurement information is utilized, such that the alignment performance is improved in limited time.
(2): A framework of the Q-learning Kalman filter (QKF) that combines the celebrated Q-learning approach [36,37,38] with the KF is developed to fine tune the process noise covariance matrix related to the calibration parameters. Instead of the recursive estimation in the AKF, the Q-learning approach is designed to explore for the appropriate $Q_{k}$ . Once the appropriate $Q_{k}$ is learned, it is plugged into the model-based KF to enhance its performance. Compared with our previous works [39,40], the main advantage of the presented QKF is that only one explorative filter (instead of a group of parallel filters) is required in the Q-learning process so as to simplify the implementation of the algorithm. Correspondingly, the computational load of the algorithm is decreased, which facilitates the application of the algorithm on the spacecraft with limited computing resource.

This study is structured as follows. In Section 2, the attitude plus angular rate matching scheme and the inter-spacecraft rapid transfer alignment system model are presented. In Section 3, the QKF algorithm for rapid transfer alignment is provided. In Section 4, the potential performance of the transfer alignment system is analyzed via the calculation of the Cramer–Rao lower bounds (CRLB) [41]. In Section 5, the estimation performance of the KF, the AKF and the presented QKF designed based on the transfer alignment system model are compared via simulations. Finally, the conclusions are drawn in the last section.

2. Attitude Plus Angular Rate Matching Scheme

2.1. Main Idea

The inter-spacecraft rapid transfer alignment model is derived to calibrate the GU on the slave spacecraft based on the ADS on the master spacecraft. The basic principle of the considered transfer alignment system is illustrated in Figure 1.

Figure 1. Diagram of inter-spacecraft rapid transfer alignment.

As shown in Figure 1, on the master spacecraft, the ADS consisting of gyroscopes and star sensors are utilized to obtain the attitude reference information. On the slave spacecraft, the transfer alignment filter incorporates the attitude reference information and the GU measurement to estimate the attitude, the GU bias and the calibration parameters. The transfer alignment filter is designed based on the transfer alignment model composed of the state equation and the measurement equation. The calibration parameters include the scale factor error and the misalignment of the GU on the slave spacecraft. The construction of the transfer alignment model and the parameterization of the systematic errors fit within the spacecraft attitude determination framework [10].

In this paper, the spacecraft attitude describes the rotation of the spacecraft body frame relative to the geocentric equatorial inertial frame. For the geocentric equatorial inertial frame, the origin is the Earth’s center, the X axis points to the Equinox direction, the Y axis points toward the North pole and the Z axis forms a right-handed coordinate with the X axis and Y axis. For the spacecraft body frame, the origin is the center mass of the spacecraft, and the X axis, Y axis and Z axis are parallel to the principal axis of inertia and form a right-handed coordinate.

2.2. State Equation

For the design of the transfer alignment filter, the state equation of the transfer alignment model is established based on the attitude kinematics and the error model of the GU on the slave spacecraft. In this paper, the GU is supposed to be three gyroscopes which measure the angular rate of the slave spacecraft relative to the inertial frame. The GU error model includes gyroscope bias, scale factor error, misalignment and random noise composed of angle random walk (ARW) and rate random walk (RRW), as shown in Figure 2.

Figure 2. Diagram of gyroscope error model.

In the GU error model, the measured angular rate

ω_{g k}

is related to the true angular rate

ω_{b k}

of the slave spacecraft relative to inertial frame in the body frame by [21]

ω_{g k} = (I + Λ_{k}) (I + Δ_{k}) C_{b}^{g} ω_{b k} + b_{k} + η_{a k},

(1)

where

I

denotes the unit matrix with compatible dimension,

Λ_{k}

is the scale factor error matrix,

Δ_{k}

is the misalignment matrix,

C_{b}^{g}

is the attitude transformation matrix from the body frame to the sensor frame,

b_{k}

is the GU bias,

η_{a k}

is the random noise called angular random walk, the subscript

k

denotes discrete time. The true angular rate is written as

ω_{b k} = {[\begin{matrix} ω_{b x k} & ω_{b y k} & ω_{b z k} \end{matrix}]}^{T}

, where

ω_{b x k}

,

ω_{b y k}

and

ω_{b z k}

are the elements of 3-axis angular rate. The scale factor error matrix is given by

Λ_{k} = [\begin{matrix} λ_{x k} & 0 & 0 \\ 0 & λ_{y k} & 0 \\ 0 & 0 & λ_{z k} \end{matrix}],

(2)

where

λ_{x k}

,

λ_{y k}

and

λ_{z k}

are the scale factor error parameters. The misalignment matrix is described as

Δ_{k} = [\begin{matrix} 0 & {\bar{δ}}_{x y k} & {\bar{δ}}_{x z k} \\ {\bar{δ}}_{y x k} & 0 & {\bar{δ}}_{y z k} \\ {\bar{δ}}_{z x k} & {\bar{δ}}_{z y k} & 0 \end{matrix}],

(3)

where

{\bar{δ}}_{x y k}

,

{\bar{δ}}_{x z k}

,

{\bar{δ}}_{y x k}

,

{\bar{δ}}_{y z k}

,

{\bar{δ}}_{z x k}

and

{\bar{δ}}_{z y k}

are elements in the misalignment matrix. From Equations (2) and (3), we have

{(I + Λ}_{k}) (I + Δ_{k}) = {I + M}_{k},

(4)

with

M_{k} = [\begin{matrix} λ_{x k} & δ_{x y k} & δ_{x z k} \\ δ_{y x k} & λ_{y k} & δ_{y z k} \\ δ_{z x k} & δ_{z y k} & λ_{z k} \end{matrix}],

(5)

where

δ_{x y k} = (1 + λ_{x k}) {\bar{δ}}_{x y k},

(6)

δ_{x z k} = (1 + λ_{x k}) {\bar{δ}}_{x z k},

(7)

δ_{y x k} = (1 + λ_{y k}) {\bar{δ}}_{y x k},

(8)

δ_{y z k} = (1 + λ_{y k}) {\bar{δ}}_{y z k},

(9)

δ_{z x k} = (1 + λ_{z k}) {\bar{δ}}_{z x k},

(10)

δ_{z y k} = (1 + λ_{z k}) {\bar{δ}}_{z y k} .

(11)

δ_{x y k}

,

δ_{x z k}

,

δ_{y x k}

,

δ_{y z k}

,

δ_{z x k}

and

δ_{z y k}

are the misalignment parameters. Note that the products of scale factors and misalignments are combined in Equation (5) to simplify the formulation. Substituting Equation (4) into Equation (1), the GU error model is reformulated as

ω_{g k} = ({I + M}_{k}) C_{b}^{g} ω_{b k} + b_{k} + η_{a k} .

(12)

To derive the attitude error equation, the estimate of the spacecraft angular rate

{\hat{ω}}_{b k}

is written as

{\hat{ω}}_{b k} = C_{g}^{b} (ω_{g k} - {\hat{b}}_{k}),

(13)

where

{\hat{b}}_{k}

is the estimate of the GU bias,

C_{g}^{b}

is the attitude transformation matrix from the sensor frame to the body frame. Substituting Equation (12) into Equation (13), we have

{\hat{ω}}_{b k} = C_{g}^{b} ({I + M}_{k}) C_{b}^{g} ω_{b k} + C_{g}^{b} {δ b}_{k} + C_{g}^{b} η_{a k},

(14)

where

{δ b}_{k} = b_{k} - {\hat{b}}_{k},

(15)

is the GU bias error. The angular rate estimation error is defined as

{δ ω}_{b k} = ω_{b k} - {\hat{ω}}_{b k} .

(16)

Substituting Equation (14) into Equation (16) yields

{δ ω}_{b k} = - C_{g}^{b} M_{k} C_{b}^{g} ω_{b k} - C_{g}^{b} {δ b}_{k} - C_{g}^{b} η_{a k} .

(17)

Equation (17) is reformulated as

{δ ω}_{b k} = - C_{g}^{b} Ω_{g k} δ_{k} - C_{g}^{b} {δ b}_{k} - C_{g}^{b} η_{a k},

(18)

with

Ω_{g k} = [\begin{matrix} {(C_{b}^{g} ω_{b k})}^{T} & 0_{1 \times 3} & 0_{1 \times 3} \\ 0_{1 \times 3} & {(C_{b}^{g} ω_{b k})}^{T} & 0_{1 \times 3} \\ 0_{1 \times 3} & 0_{1 \times 3} & {(C_{b}^{g} ω_{b k})}^{T} \end{matrix}],

(19)

and

δ_{k} = {[\begin{matrix} \begin{matrix} λ_{x k} & δ_{x y k} & δ_{x z k} \end{matrix} & \begin{matrix} δ_{y x k} & λ_{y k} & δ_{y z k} \end{matrix} & \begin{matrix} δ_{z x k} & δ_{z y k} & λ_{z k} \end{matrix} \end{matrix}]}^{T}

(20)

is the calibration parameter vector.

For the spacecraft attitude determination, the error quaternion

δ q_{k}

is defined as

δ q_{k} = q_{k} ⨂ {\hat{q}}_{k}^{- 1},

(21)

where

q_{k}

is the attitude quaternion that describes the spacecraft attitude relative to the inertial frame,

{\hat{q}}_{k}

is the estimate of

q_{k}

. According to the spacecraft attitude kinematics, the propagation of the error quaternion is described by the following perturbation equation [11]:

δ ρ_{k + 1} = (I - τ [ω_{b k} \times]) δ ρ_{k} + \frac{τ}{2} {δ ω}_{b k},

(22)

where

δ ρ_{k}

is the vector part of the error quaternion,

τ

denotes the time interval of discretization,

[ω_{b k} \times]

is the skew symmetric matrix defined as

[ω_{b k} \times] = [\begin{matrix} 0 & - ω_{b z k} & ω_{b y k} \\ ω_{b z k} & 0 & - ω_{b x k} \\ - ω_{b y k} & ω_{b x k} & 0 \end{matrix}] .

(23)

Substituting Equation (18) into Equation (22), the perturbation equation is modified as

δ ρ_{k + 1} = (I - τ [ω_{b k} \times]) δ ρ_{k} - \frac{τ}{2} C_{g}^{b} Ω_{g k} δ_{k} - \frac{τ}{2} C_{g}^{b} {δ b}_{k} - \frac{τ}{2} C_{g}^{b} η_{a k} .

(24)

It is evident that the effects of the GU bias, scale factor error, misalignment and random noise to the propagation of the vector part of the error quaternion are described in Equation (24).

For transfer alignment, the spacecraft attitude, the GU bias, scale factor error and misalignment should be estimated. Accordingly, the state vector is constructed as the combination of the vector part of the error quaternion

δ ρ_{k}

, the GU bias error

{δ b}_{k}

and the calibration parameter vector

δ_{k}

, which is given by

x_{k} = {[\begin{matrix} {δ ρ}_{k}^{T} & δ b_{k}^{T} & δ_{k}^{T} \end{matrix}]}^{T} .

(25)

From Equations (24) and (25), we obtain the state equation:

x_{k + 1} = F_{k} x_{k} + w_{k},

(26)

with the state transition matrix

F_{k} = [\begin{matrix} I_{3 \times 3} - τ [ω_{b k} \times] & - \frac{τ}{2} C_{g}^{b} & - \frac{τ}{2} C_{g}^{b} Ω_{g k} \\ 0_{3 \times 3} & I_{3 \times 3} & 0_{3 \times 9} \\ 0_{9 \times 3} & 0_{9 \times 3} & I_{9 \times 9} \end{matrix}] .

(27)

The process noise

w_{k}

is given by

w_{k} = {[\begin{matrix} - \frac{τ}{2} {(C_{g}^{b} η_{a k})}^{T} & η_{r k}^{T} & η_{c k}^{T} \end{matrix}]}^{T},

(28)

where

η_{r k}

is the random noise that drive the rate random walk,

η_{c k}

is introduced to describe the drift of the calibration parameters. It is often assumed that

w_{k}

is the Gaussian white noise with zero mean. The process noise covariance matrix

Q_{k}

is a positive definite symmetric matrix with the following structure

Q_{k} = E ([\begin{matrix} - \frac{τ}{2} C_{g}^{b} η_{a k} \\ η_{r k} \\ η_{c k} \end{matrix}] [\begin{matrix} - \frac{τ}{2} C_{g}^{b} η_{a k} & η_{r k} & η_{c k} \end{matrix}]) = [\begin{matrix} Q_{a k} & 0 & 0 \\ 0 & Q_{r k} & 0 \\ 0 & 0 & Q_{c k} \end{matrix}],

(29)

where

Q_{a k}

and

Q_{r k}

are the sub-matrices related to the vector part of the error quaternion

δ ρ_{k}

and the GU bias error

{δ b}_{k}

respectively,

Q_{c k}

is the sub-matrix related to the calibration parameter vector

δ_{k}

.

2.3. Measurement Model

For the attitude plus angular rate matching scheme, both the attitude and the angular rate reference information achieved from the ADS on the master spacecraft are utilized for the on-orbit calibration. It is expected that the calibration parameters can be estimated effectively with the attitude reference information from the master spacecraft. When the attitude and the angular reference information is available, the measurement equation is written as

y_{k} = H_{k} x_{k} + v_{k},

(30)

with the measurement

y_{k} = {[\begin{matrix} y_{ρ k}^{T} & y_{ω k}^{T} \end{matrix}]}^{T},

(31)

the measurement matrix

H_{k} = [\begin{matrix} I_{3 \times 3} & 0_{3 \times 3} & 0_{3 \times 9} \\ 0_{3 \times 3} & - C_{g}^{b} & - C_{g}^{b} Ω_{g k} \end{matrix}],

(32)

and the measurement noise

v_{k} = {[\begin{matrix} v_{ρ k}^{T} & v_{ω k}^{T} \end{matrix}]}^{T},

(33)

where

y_{ρ k}

is the vector part of the error quaternion between the quaternions obtained from the ADS on the master spacecraft and the GU on the slave spacecraft,

y_{ω k}

is the difference between the angular rates obtained from the ADS on the master spacecraft and the GU on the slave spacecraft,

v_{ρ k}

is the attitude measurement noise with the covariance matrix

R_{ρ k}

,

v_{ω k}

is the angular rate measurement noise with the covariance matrix

R_{ω k}

. The measurement noise covariance matrix

R_{k}

is the mixture of

R_{ρ k}

and

R_{ω k}

:

R_{k} = [\begin{matrix} R_{ρ k} & 0 \\ 0 & R_{ω k} \end{matrix}] .

(34)

Note that both the measurement noises of the ADS on the master spacecraft and the GU on the slave spacecraft are contained in Equation (33). Generally, to implement the inter-spacecraft rapid transfer alignment effectively, the ADS on the master spacecraft should be more accurate than the GU on the slave spacecraft. It is expected that the systematic errors in the ADS on the master spacecraft have been compensated before the implementation of the inter-spacecraft rapid transfer alignment.

The state Equation (26) and the measurement Equation (30) compose the system model for the attitude plus angular rate matching scheme. With the system model, the transfer alignment KF is designed to implement the state estimation. It should be mentioned that, for the considered attitude determination system composed of the star sensors and the gyroscopes, as it is widely used in current satellites, the feasibility of the system model has been verified through multiple space missions. In general, for a novel navigation system, the hardware-in-loop experiment with the simulation of the operational environment is an effective approach for model verification.

2.4. Transfer Alignment KF

On the basis of the system model, the transfer alignment KF is designed to estimate the state vector

x_{k}

based on the measurement

y_{k}

. The procedure of the inter-spacecraft rapid transfer alignment method based on the KF with the prediction and update procedures is collected in Algorithm 1.

Algorithm 1: Transfer alignment Kalman filter

1: Initialize attitude quaternion estimate

{\hat{q}}_{0}

, bias estimate

{\hat{b}}_{0}

, state estimate

{\hat{x}}_{0}

and its estimation error covariance matrix

P_{0}

2: for k = 1, 2, …, K, do

3:

{\bar{ω}}_{b k - 1} \leftarrow C_{g}^{b} {({I + \hat{M}}_{k - 1})}^{- 1} (ω_{g k - 1} - {\hat{b}}_{k - 1})

4:

{\hat{q}}_{k} \leftarrow [I_{4 \times 4} \cos (\frac{‖{\bar{ω}}_{b k - 1}‖ τ}{2}) + \frac{Φ ({\bar{ω}}_{b k - 1})}{‖{\bar{ω}}_{b k - 1}‖} s i n (\frac{‖{\bar{ω}}_{b k - 1}‖ τ}{2})] {\hat{q}}_{k - 1}

5:

{\hat{b}}_{k} \leftarrow {\hat{b}}_{k - 1}

6:

{\hat{x}}_{k | k - 1} (1 : 6,1) \leftarrow 0

7:

{\hat{x}}_{k | k - 1} (7 : 15,1) \leftarrow {\hat{x}}_{k - 1} (7 : 15,1)

8:

P_{k | k - 1} \leftarrow F_{k} P_{k - 1} F_{k}^{T} + Q_{k}

9: if the measurement

y_{k}

is available, then

10:

K_{k} \leftarrow P_{k | k - 1} H_{k}^{T} (H_{k} P_{k | k - 1} H_{k}^{T} {+ R_{k})}^{- 1}

11:

{\tilde{y}}_{k} \leftarrow y_{k} - H_{k} {\hat{x}}_{k | k - 1}

12:

{\hat{x}}_{k} \leftarrow {\hat{x}}_{k | k - 1} + K_{k} {\tilde{y}}_{k}

13:

P_{k} \leftarrow (I - K_{k} H_{k}) P_{k | k - 1} {(I - K_{k} H_{k})}^{T} + K_{k} R_{k} K_{k}^{T}

14: end if

15:

{\hat{q}}_{k} \leftarrow {δ \hat{q}}_{k} ⨂ {\hat{q}}_{k}

16:

{\hat{b}}_{k} \leftarrow {\hat{b}}_{k} + {δ \hat{b}}_{k}

17: end for

18: return

\{{\hat{q}}_{k}\}

,

\{{\hat{b}}_{k}\}

,

\{{\hat{x}}_{k}\}

,

\{P_{k}\}

In the algorithm,

{\hat{x}}_{k | k - 1}

and

{\hat{x}}_{k}

are the prediction and the estimate of the state vector,

P_{k | k - 1}

and

P_{k}

are their corresponding estimation error covariance matrices, K is the length of the measurement data,

K_{k}

is the Kalman gain,

{\tilde{y}}_{k}

is the measurement innovation. The matrix

{\hat{M}}_{k}

is constructed with the estimate of the calibration parameter vector as the estimate of

M_{k}

. The expression of

Φ ({\bar{ω}}_{b k})

is

Φ ({\bar{ω}}_{b k}) = [\begin{matrix} - [{\bar{ω}}_{b k} \times] & {\bar{ω}}_{b k} \\ - {\bar{ω}}_{b k}^{T} & 0 \end{matrix}] .

(35)

To facilitate the implementation of the algorithm, similar to the method presented in [11], as the attitude quaternion

{\hat{q}}_{k}

is propagated based on the spacecraft attitude kinematics equation, the state transition matrix

F_{k}

in the state Equation (26) is only used for the estimation error covariance propagation in the KF. Note that the propagation of the error quaternion

{δ \hat{q}}_{k}

is not beneficial to improve the filtering performance. To deal with the problem caused by the discontinuity of the measurement data, for the filtering algorithm shown in Algorithm 1, the propagation is performed in each time step, while the update is performed in the case that the measurement data are available.

It is seen from Algorithm 1 that the efficiency of the update to the prediction

{\hat{x}}_{k | k - 1}

with the measurement innovation

{\tilde{y}}_{k}

depends on the Kalman gain

K_{k}

, which is adjusted through the noise covariance matrices

Q_{k}

and

R_{k}

. In the measurement noise covariance matrix,

R_{ρ k}

and

R_{ω k}

are symmetric positive definite matrices determined according to the measurement error behavior of the star sensors and the GU specified by the manufacturer. In the process noise covariance matrix, the elements related to

η_{a k}

and

η_{r k}

are determined with the ARW and RRW coefficients, which are achievable through the Allan variance analysis [42]. However, it is difficult to determine the accurate noise covariance for the calibration parameters

η_{c k}

in the absence of prior knowledge. As mentioned in the introduction, the proper choice for the process noise covariance matrix is critical for accurate transfer alignment. The state estimate of the KF may deviate from its actual value if

Q_{k}

is not set appropriately. In order to identify the unknown elements in

Q_{k}

, a Q-learning-based filtering algorithm is presented in the next section as a modification of the standard KF.

3. Q-Learning Kalman Filter

3.1. Q-Learning Approach

When aiming to solve the problem of fine tuning the elements related to the calibration parameters in the process noise covariance matrix of the transfer alignment KF, a Q-learning Kalman filter is presented, which is a combination of the KF algorithm and the Q-learning approach. The Q-learning approach is a representative reinforce learning (RL) method [43], which is constructed on the key idea that the successful decision should be remembered by the agent interacting with its environment, which could provide a reinforcement signal as the feedback to the successful decision of the agent, such that the decision becomes more likely to be made in the future.

The Q-learning approach is implemented through a recursive trial-and-error process. In each iteration, the agent performs an action and receives an immediate reward from the environment, which indicates whether the action is good or not. The reward is accumulated to update the action selection strategy represented with the Q-function. Then, an optimized action is selected according to the updated strategy for the next iteration. This process is repeated several times so that the appropriate action for the maximization of the accumulated reward tends to be selected by the agent. The model of the environment is not necessary in the Q-learning process, which facilitates the implementation of the approach. The Q-learning approach has become the basis of many learning algorithms and exhibits an excellent learning ability.

The purpose of the Q-learning approach is to achieve the proper strategy to select the action

a \in A

in a specific state

s \in S

, where

A

and

S

are the action space and the state space, respectively. To achieve the purpose, the iterative update of the Q-function

Q^{(k)} (s, a)

is formulated as

Q^{(k)} (s, a) = (1 - α) Q^{(k - 1)} (s, a) + α [R (s, a) + γ \max_{a^{’} \in A} Q^{(k - 1)} (s^{'}, a^{'})],

(36)

where

R (s, a)

is the immediate reward for the action

a

that is performed in the state

s

,

0 < α \leq 1

is the learning rate,

0 < γ \leq 1

is the discount factor and

s ’

is the transited state after the action

a

is performed. It is seen from (36) that the Q-learning process is the incremental estimation of the Q-function

Q^{(k)} (s, a)

. In the learning process, the discount factor

γ

is used to weigh between the immediate reward

R (s, a)

and the accumulated reward represented by the previous estimation of the Q-function. The effect of the accumulated reward to the estimation of Q-function will be enhanced with the increase in the parameter

γ

. The learning rate

α

is used to weigh between the previous estimation of the Q-function

Q^{(k - 1)} (s, a)

and its update, as shown in (36). The effect of the update will be enhanced with the increase of the parameter

α

.

To guarantee the efficiency of the Q-learning approach, it is important to design the action space

A

, the state space

S

and the reward

R (s, a)

properly. According to the Q-function

Q^{(k)} (s, a)

, the selected action

a_{m a x}

for the next iteration can be determined as

a_{m a x} \leftarrow \arg \max_{a \in A} Q^{(k)} (s, a) .

(37)

As the Q-function represents the accumulated reward, the action selection strategy shown in Equation (37) is beneficial to maximize the accumulated reward.

3.2. Q-Learning Based Covariance Tuning

The model-based KF and the data-driven Q-learning approach are combined to design the QKF algorithm, where the Q-learning approach is used instead of the recursive estimation in the AKF algorithm to determine the process noise covariance matrix. To simplify the Q-learning process, only the sub-matrix

Q_{c k}

is tuned in the presented algorithm. The suggested framework of the QKF is shown in Figure 3.

Figure 3. Q-learning Kalman filtering framework.

The Q-learning approach is implemented based on the measurement data obtained from the ADS on the master spacecraft and the GU on the slave spacecraft. Once the sub-matrix corresponding to the calibration parameter vector is selected as

{\hat{Q}}_{c k}

via the Q-learning approach, it is plugged into the transfer alignment KF presented in Algorithm 1 for the state estimation. The process noise covariance matrix of the transfer alignment KF is formulated as

{\hat{Q}}_{k} = [\begin{matrix} Q_{a k} & 0 & 0 \\ 0 & Q_{r k} & 0 \\ 0 & 0 & {\hat{Q}}_{c k} \end{matrix}] .

(38)

It differs from that of the standard KF as the fine-tuned

{\hat{Q}}_{k}

is adopted instead of the original

Q_{k}

to calculate the gain matrix

K_{k}

. In other application scenarios, all sub-matrices in

Q_{k}

should be tuned, a parallel Q-learning based filtering algorithm similar to the method developed in [40] could be adopted.

To select the appropriate process noise covariance matrix for filtering performance enhancement, in the designed Q-learning approach, the state space

S

is constructed with different design values of the sub-matrix

Q_{c k}^{(s)}

. Each state

s

is related to an element in the pre-determined set

\{\dots, Q_{c k}^{(s)}, \dots\}

. The action space

A

is constructed with different state transition actions, including the transition to the adjacent state or stay at the current state. Furthermore, the immediate reward

R (s, a)

is constructed with the measurement innovation of an explorative KF designed based on the system model shown in (26) and (30). The structure and parameters of the explorative KF is similar to those of the transfer alignment KF, except the process noise covariance matrix related to the current state

{\bar{Q}}_{k}^{(s)}

is adopted. Specifically, the process noise covariance matrix in the explorative KF is formulated as

{\bar{Q}}_{k}^{(s)} = [\begin{matrix} Q_{a k} & 0 & 0 \\ 0 & Q_{r k} & 0 \\ 0 & 0 & Q_{c k}^{(s)} \end{matrix}] .

(39)

In the Q-learning process, for the state that transits from

s

to

s ’

after the action

a

is performed, the reward is designed as

R (s, a) = {({\tilde{y}}_{e}^{(s)})}^{T} {\tilde{y}}_{e}^{(s)} - {({\tilde{y}}_{e}^{(s ’)})}^{T} {\tilde{y}}_{e}^{(s ’)},

(40)

where

{\tilde{y}}_{e}^{(s)}

is the measurement innovation of the explorative KF with

{\bar{Q}}_{k}^{(s)}

used instead of

Q_{k}

. From the reward represented in Equation (40), it is evident that a positive reward is achieved if the action

a

is beneficial to reduce the measurement innovation. Conversely, the action that increases the measurement innovation leads to negative feedback for the agent. Considering that measurement innovation is an indicator of filtering performance, it is expected that the sub-matrix

{\hat{Q}}_{c k},

selected with the Q-learning approach via the maximization of the cumulative reward represented by the Q-function

Q (s, a),

is valuable to improve the state estimation accuracy. Note that the subscript

(k)

of the Q-function

Q^{(k)} (s, a)

is omitted hereafter to simplify the notation.

Following the previous description, the Q-learning process in the QKF algorithm to achieve

{\hat{Q}}_{c k}

is presented in Algorithm 2. To balance the exploration and exploitation in the Q-learning process, the

ε

–greedy strategy [43] is adopted as a modification of the basic action selection strategy shown in Equation (37).

Algorithm 2: Q-learning process in QKF

1:

{\hat{x}}_{e 0} \leftarrow {\hat{x}}_{0}

,

P_{e 0} \leftarrow P_{0}

2: Initialize parameter set

\{\dots, Q_{c k}^{(s)}, \dots\}

3: for all

s \in S

,

{\bar{Q}}_{k}^{(s)} (1 : 6,1 : 6) \leftarrow Q_{k} (1 : 6,1 : 6)

,

{\bar{Q}}_{k}^{(s)} (7 : 15,7 : 15) \leftarrow Q_{c k}^{(s)}

4: for all

s \in S

,

a \in A

,

Q (s, a) \leftarrow 0

5: Initialize state

s

6:

{\tilde{y}}_{e}^{(s)} \leftarrow s q r t (e i g (H_{0} P_{e 0} H_{0}^{T} + R_{0}))

7: for k = 1, 2, …, K, do

8:

a \leftarrow ε - g r e e d y (s, Q (s, a), A, ε)

9: Perform

a

and observe reached state

s ’

10:

[{\hat{x}}_{e k}, P_{e k}, {\tilde{y}}_{e}^{(s ’)}] \leftarrow K F ({\hat{x}}_{e k - 1}, P_{e k - 1}, y_{k}, {\bar{Q}}_{k}^{(s ’)}, R_{k})

11:

R (s, a) \leftarrow {({\tilde{y}}_{e}^{(s)})}^{T} {\tilde{y}}_{e}^{(s)} - {({\tilde{y}}_{e}^{(s ’)})}^{T} {\tilde{y}}_{e}^{(s ’)}

12:

Q (s, a) = (1 - α) Q (s, a) + α [R (s, a) + γ \max_{a’ \in A} Q (s ’, a ’)]

13:

s \leftarrow s ’

14:

{\tilde{y}}_{e}^{(s)} \leftarrow {\tilde{y}}_{e}^{(s ’)}

15: end for

16:

{\hat{Q}}_{c k} \leftarrow Q_{c k}^{(s)}

17: return

{\hat{Q}}_{c k}

In the algorithm,

{\hat{x}}_{e k}

and

P_{e k}

are the state estimate and its estimation error covariance matrix of the explorative KF, and

ε

is the probability of the

ε

-greedy strategy to select a random action. The function

e i g ()

denotes the eigenvalues of a square matrix. The function

s q r t ()

denotes the square root of each element in a vector. The function

ε - g r e e d y ()

denotes the

ε

-greedy strategy, where the agent selects the random action in the action space

A

with the probability

ε

and selects the action that maximizes the Q-function, as shown in Equation (37) with the probability

(1 - ε)

. The function

K F ()

is the KF equations, which are similar to the predication and update equations in Algorithm 1. The output

{\hat{Q}}_{c k}

of the algorithm is exploited in the transfer alignment KF.

Generally, the computational load of the QKF is related to the number of the parallel KFs in the algorithm. From previous works, it is known that only a few explorative KFs are sufficient to improve the filtering performance evidently. In this paper, from Figure 3 and Algorithm 2, only one explorative KF and one transfer alignment KF are contained in the QKF algorithm. Thus, the computational load of the QKF is about two times larger than the standard KF. It is easy to complete the computation of the QKF in the interval of the measurement update. The moderate increase in the computational load is affordable for the current onboard computers.

To ensure the efficiency of the QKF algorithm, an important aspect is to design the bound of the state space

S

appropriately. It is expected that a dynamic state space with the bound stretched automatically can be designed in future works. On the basis of the QKF presented in Algorithm 2, the measurement innovation sequence in a time window could be taken into account to suppress the unfavorable effect of the measurement noise [40]. Although the process noise covariance matrix obtained from the algorithm may not be globally optimal, it is often an effective and simple approach to improve the filtering performance.

4. CRLB of Transfer Alignment System

The feasibility of the inter-spacecraft rapid transfer alignment method based on the attitude plus angular rate matching scheme is analyzed through the CRLB. The CRLB is a theoretical bound on the achievable state estimation accuracy for certain system models. It facilitates the potential performance analysis of the transfer alignment method before the numerical simulation of the filtering algorithm. For the linear discrete-time stochastic system formulated in Equations (26) and (30), the calculation process of the CRLB is described in Algorithm 3.

Algorithm 3: Calculation of CRLB

1:

J_{0} \leftarrow P_{0}^{- 1}

2: for k = 1, 2, …, K, do

3:

J_{k} \leftarrow {(F_{k} J_{k - 1}^{- 1} F_{k}^{T} + Q_{k})}^{- 1}

4: if the measurement

y_{k}

is available, then

5:

J_{k} \leftarrow J_{k} + H_{k}^{T} R_{k}^{- 1} H_{k}

6: end if

7:

{\hat{P}}_{k} \leftarrow J_{k}^{- 1}

8: end for

9: return

{{\hat{P}}_{k}}

In the algorithm,

J_{k}

is the fisher information matrix calculated with the considered system model. The square roots of the diagonal elements of the calculated matrix

{\hat{P}}_{k}

provide the theoretical bound of the state estimation root mean square (RMS) error.

The CRLB analysis is implemented under the following conditions. The master spacecraft is an Earth satellite with an orbit altitude of 700 km and its attitude keeps in orientation to the Earth. The slave spacecraft is mounted on the master spacecraft. For the ADS on the master spacecraft, the attitude determination accuracy is 3″ and the angular rate determination accuracy is 0.02°/h. For the GU on the slave spacecraft, the ARW and RRW coefficients are 4 × 10⁻⁴°/h^0.5 and 1 × 10⁻³°/h^1.5 respectively. The scale factor error parameter vector is set as

{[500 500 500]}^{T} p p m

(parts per million) and the misalignment parameter vector is set as

{[\begin{matrix} \begin{matrix} 50 ″ & 50 ″ & 50 ″ \end{matrix} & \begin{matrix} 50 ″ & 50 ″ & 50 ″ \end{matrix} \end{matrix}]}^{T}

. The update rate of the filter is 1 Hz. The total simulation time of the transfer alignment is 3600 s. The calibration maneuver is the sequential rotation around the three orthogonal axes of the master spacecraft body frame. Generally, the transfer alignment performance can be improved when the calibration maneuver angular rate is increased. Considering that the dynamic measurement performance of the typical star sensor may be degraded if its angular rate is larger than 2°/s, the rotation angular rate of the master spacecraft is set as 1°/s.

The CRLB for the inter-spacecraft rapid transfer alignment system described in Section 2 is calculated using Algorithm 3. Figure 4, Figure 5 and Figure 6 give the theoretical error bounds of the calibration parameters when the time length of the rotation around each axis is 100 s, 200 s, 400 s and 600 s, respectively.

Figure 4. Estimation error bound for

λ_{x k}

,

λ_{y k}

and

λ_{z k}

with different maneuver time.

Figure 5. Estimation error bound for

δ_{x y k}

,

δ_{x z k}

and

δ_{y x k}

with different maneuver time.

Figure 6. Estimation error bound for

δ_{y z k}

,

δ_{z x k}

and

δ_{z y k}

with different maneuver time.

From the CRLB curves, it is evident that the rapidity of the transfer alignment is guaranteed when a shorter rotation time is adopted. According to the analysis results, in the following numerical simulation, the time length of the rotation around each axis is set as 100 s for the calibration maneuver.

5. Simulation Results

To illustrate the high performance of the inter-spacecraft rapid transfer alignment method based on the QKF, the simulation results are shown in this section. For the data generation of the sensors, the simulation conditions are same as those in Section 4. In the transfer alignment KF, the initial state estimation error covariance matrix is set as follows.

P_{0} = [\begin{matrix} p_{a}^{2} I_{3 \times 3} \\ p_{r}^{2} I_{3 \times 3} \\ p_{c}^{2} I_{3 \times 3} \end{matrix}],

(41)

where

p_{a} = 0.05 °

,

p_{r} = 0.05 ° / h

. The elements in

p_{c}

are larger than triple the magnitude of the calibration parameters given in Section 4. Similarly to the attitude determination KF [21], in the process noise covariance matrix

Q_{k}

, the sub-matrices

Q_{a k}

and

Q_{r k}

are designed according to the ARW and RRW coefficients of the GU. The sub-matrix

Q_{c k}

is set as

Q_{c k} = q_{c}^{2} I_{9 \times 9}

, where the magnitude of the parameter

q_{c}

is

10^{- 5}

. In the measurement noise covariance matrix

R_{k}

, the sub-matrices are set as

R_{ρ k} = r_{ρ}^{2} I_{3 \times 3}

and

R_{ω k} = r_{ω}^{2} I_{3 \times 3}

, where

r_{ρ} = 3 ″

and

r_{ω} = 0.02 ° / h

. For the inter-spacecraft rapid transfer alignment, the initial attitude estimate

{\hat{q}}_{0}

is obtained from the ADS on the master spacecraft, and the state vector related to the calibration parameters is initialized as zero.

The first simulation is performed to illustrate the high performance of the attitude plus angular rate matching scheme presented in Section 2. The presented attitude plus angular rate matching scheme is compared with the traditional attitude matching scheme via the simulation. The average RMS errors of the attitude plus angular rate matching scheme and the attitude matching scheme for the estimation of the attitude and the calibration parameters obtained from 10 individual trials are plotted together in Figure 7, Figure 8 and Figure 9.

Figure 7. Attitude estimation errors of different measurement matching schemes.

Figure 8. Scale factor estimation errors of different measurement matching schemes.

Figure 9. Misalignment estimation errors of different measurement matching schemes.

It is seen from Figure 7, Figure 8 and Figure 9 that the presented scheme performs better than the traditional scheme. The reason is that more measurement information is available for the transfer alignment in the attitude plus angular rate matching scheme. With the simulation results shown in the figures above, we conclude that the presented attitude plus angular rate matching scheme is efficient for the inter-spacecraft transfer alignment.

The aim of the second simulation is to illustrate the performance of the QKF presented in Section 3. In the QKF algorithm, the Q-learning parameters are set as

α = 0.2

,

γ = 0.8

and

ε = 0.5

. Generally, the parameters can be designed with a trial-and-error method via the numerical simulation. For the considered noise covariance adaptation problem in the filtering algorithm, it was found in previous works [39,44] that the influence of the design parameters is not significant when they are chosen in certain scopes. In the pre-determined set related to the state space

S

for the tuning of the process noise covariance matrix, the sub-matrix

Q_{c k}^{(s)}

is set as

Q_{c k}^{(s)} = {(λ^{(s)})}^{2} q_{c}^{2} I_{9 \times 9},

(42)

where

λ^{(s)}

is a scalar factor in the range of

λ^{(s)} \in [1, 100]

, with 11 different values inside this interval. The cardinality of the pre-determined set is rather small, which is beneficial for the Q-learning approach in order to achieve a reasonable result. In fact, for the considered scenario, a small parameter set is effective to improve the filtering performance. Nevertheless, a sophisticated Q-learning approach may be required for more complicated problems.

The estimation error curves of the attitude and the calibration parameters as well as their corresponding

\pm 3 σ

error bounds computed from the filter’s error covariance matrix are shown in Figure 10, Figure 11 and Figure 12.

Figure 10. Attitude estimation error of transfer alignment based on QKF.

Figure 11. Scale factor estimation error of transfer alignment based on QKF.

Figure 12. Misalignment estimation error of transfer alignment based on QKF.

It is seen from the figures that the estimation errors of the calibration parameters diminish rapidly after the calibration maneuver is performed. Both the scale factor and the misalignment estimation errors converge in about 600 s. All the estimation error curves of the QKF are contained in the corresponding error bounds, which indicate the consistency of the filtering algorithm. The simulation result illustrates that all the systematic errors of the GU on the slave spacecraft in the system model can be calibrated rapidly and accurately with the ADS on the master spacecraft. Simultaneously, it shows that the QKF is feasible for the inter-spacecraft rapid transfer alignment.

To facilitate the comparison, the average RMS errors of different transfer alignment methods are listed in Table 1.

Table 1. Average RMS error of state estimation error for transfer alignment.

Obviously, the presented attitude plus angular rate matching scheme outperforms the traditional attitude matching scheme. The state estimation accuracy can be improved when the presented QKF is used instead of the standard KF.

Furthermore, to demonstrate the advantage of the QKF algorithm, the simulation of the inter-spacecraft rapid transfer alignment methods based on different filtering algorithms are performed, including the standard KF, the Sage-Husa AKF and the QKF. For a fair comparison, three filtering algorithms share the basic filtering parameters, including the initialization parameters, the state transition matrix, the measurement matrix and the initial noise covariance matrices. The average RMS errors of the KF, the AKF and the QKF for the estimation of the attitude and the calibration parameters obtained from 10 individual trials are plotted in Figure 13, Figure 14 and Figure 15.

Figure 13. Attitude estimation errors of different filtering algorithms.

Figure 14. Scale factor estimation errors of different filtering algorithms.

Figure 15. Misalignment estimation errors of different filtering algorithms.

The above figures illustrate that the QKF obtains the highest state estimation accuracy in comparison with the KF and the AKF. This result is distinguishable in terms of both the attitude and the calibration parameters estimation. It indicates that the QKF is more effective than the AKF to identify the appropriate process noise covariance matrix and consequently improves the filtering performance.

To illustrate the performance of the presented QKF algorithm in different scenarios, the numerical simulation is implemented with different measurement noise levels. In the inter-spacecraft rapid transfer alignment system, when the attitude measurement noise standard deviation increases from 1″ to 10″, the average attitude estimation RMS errors of the KF and the QKF obtained from 10 individual trials are plotted in Figure 16. It indicates that the QKF algorithm is less sensitive to the variation in the measurement noise levels.

Figure 16. Attitude estimation errors of KF and QKF with different measurement noise.

To sum up, it is apparent from the simulation results that the presented inter-spacecraft rapid transfer alignment method is valuable to satisfy the rapidity and accuracy requirements for the on-orbit calibration. The presented attitude plus angular rate matching scheme performs better than the traditional one. The QKF has considerable potential for practical applications in space missions as both the KF algorithm and the Q-learning approach are familiar for aerospace engineers. The basic principle of the presented method can be further applied to calibrate other attitude sensors, although the system model should be modified separately depending on the specific application.

6. Conclusions

The accurate and rapid transfer alignment is critical for the GU on the slave spacecraft, which influences the performance of the slave spacecraft after it is released from the master spacecraft. To satisfy the requirements of the space missions, this paper presents the novel inter-spacecraft rapid transfer alignment method based on the attitude plus angular rate matching scheme. For fine tuning the process noise covariance matrix of the transfer alignment KF, the Q-learning approach is incorporated with the model-based KF, which results in the QKF algorithm. Simulations are implemented for the performance evaluation. The simulation results demonstrate that the presented method can estimate all the calibration parameters in the transfer alignment model, and the misalignment estimation accuracy of less than 2 mrad is achievable within 10 min. The attitude plus angular rate matching scheme presents improved performance compared with the traditional attitude matching scheme. The QKF achieves the best estimation accuracy compared with the KF and AKF algorithms. Further work is required to evaluate the efficiency of the presented inter-spacecraft rapid transfer alignment method in practical applications and optimize the algorithm design according to the real error behavior. Although demonstrated for space missions, the presented method can be elaborated for information fusion and error calibration in other platforms for future research.

Author Contributions

Conceptualization, K.X. and P.Z.; methodology, K.X.; software, P.Z.; validation, K.X., P.Z. and X.H.; formal analysis, K.X.; writing—original draft preparation, K.X.; writing—review and editing, P.Z.; supervision, X.H.; project administration, K.X.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 62394354, 62394350 and Key Laboratory Fund Project, grant number 2023-JCJQ-LB-006-03.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Robson, D.J.; Cappelletti, C. Biomedical payload: A maturing application for CubeSats. Acta Astronaut. 2022, 191, 394–403. [Google Scholar] [CrossRef]
Zhao, Y.; Yuan, L.; Wang, X.L.; Huang, X.Y.; Liu, W.W.; Hua, B.C.; Li, M.D.; Xu, L.J.; Wang, Y.P.; Hao, C.; et al. Design and implementation of GNC system for entry capsule of Tianwen-1 probe. J. Astronaut. 2022, 43, 1–10. [Google Scholar]
Frauenholz, R.B.; Bhat, R.S.; Chesley, S.R.; Mastrodemos, N.; Owen, W.M., Jr.; Ryne, M.S. Deep impact navigation system performance. J. Spacecr. Rocket. 2008, 45, 39–56. [Google Scholar] [CrossRef]
Chen, B.; Li, X.; Zhang, G.; Guo, Q.; Wu, Y.; Wang, B.; Chen, F. On-orbit installation matrix calibration and its application on AGRI of FY-4A. J. Appl. Remote Sens. 2020, 14, 024507. [Google Scholar] [CrossRef]
Wang, P.; Li, Q.; Xu, Y.; Zhang, Y.; Xi, X.; Wu, Y.; Wu, X.; Xiao, D. Calibration of coupling errors for scale factor nonlinearity improvement in navigation-grade honeycomb disk resonator gyroscope. IEEE Trans. Ind. Electron. 2023, 70, 5347–5355. [Google Scholar] [CrossRef]
Wang, A.; Gu, D.; Huang, Z.; Liu, C.; Shao, K.; Tong, L. GRACE-FO attitude determination: Star camera installation matrix calibration and incremental quaternion integrator. Acta Astronaut. 2024, 219, 774–784. [Google Scholar] [CrossRef]
Valles, A.E.; Alva, V.R.; Belokonov, I.B. Calibration method of MEMS gyroscopes using a robot manipulator. IEEE Aerosp. Electron. Syst. Mag. 2023, 38, 20–27. [Google Scholar] [CrossRef]
Yang, A.; Hu, P.; Liu, G.; Zhang, R.; Wu, Q.; Zhou, R. Novel attitude determination method by integration of electronic level meter, INS, and low-cost turntable for level attitude evaluation and calibration of INS. Chin. J. Aeronaut. 2023, 36, 486–495. [Google Scholar] [CrossRef]
Yang, Z.; Zhu, X.; Cai, Z.; Chen, W.; Yu, J. A real-time calibration method for the systematic errors of a star sensor and gyroscope units based on the payload multiplexed. Optik 2021, 225, 165731. [Google Scholar] [CrossRef]
Pittelkau, M.E. Calibration and attitude determination with redundant inertial measurement units. J. Guid. Control. Dyn. 2005, 28, 743–752. [Google Scholar] [CrossRef]
Lefferts, E.J.; Markley, F.L.; Shuster, M.D. Kalman filtering for spacecraft attitude estimation. J. Guid. Control. Dyn. 1982, 5, 417–429. [Google Scholar] [CrossRef]
Tissera, M.S.C.; Foo, K.J.E.; Low, K.S.; Goh, S.T.; Tan, R.D. ROEKF-MPC estimator for satellite attitude and gyroscope bias estimation. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 4870–4882. [Google Scholar] [CrossRef]
Zhang, Z.F.; Lin, H.Z.; Li, G.J. A satellite attitude determination method based on virtual star sensor. Aerosp. Control. Appl. 2024, 50, 8–16. [Google Scholar]
Springmann, J.C.; Cutler, J.W. Flight results of a low-cost attitude determination system. Acta Astronaut. 2014, 99, 201–214. [Google Scholar] [CrossRef]
Porras-Hermoso, A.; Cubas, J.; Pindado, S. On the satellite attitude determination using simple environmental models and sensor data. J. Phys. Conf. Ser. 2021, 2090, 012116. [Google Scholar] [CrossRef]
Zhang, X.; Li, J. On-orbit self-calibration method of non-coplanar gyros based UDKF. Chin. Space Sci. Technol. 2021, 41, 104–111. [Google Scholar]
Pittelkau, M.E. Everything is relative in spacecraft system alignment calibration. J. Spacecr. Rocket. 2002, 39, 460–466. [Google Scholar] [CrossRef]
Qu, C.; Li, J.; Zhang, W. Improved integrated navigation method of micro position and orientation system based on installation error angle calibration. Meas. Sci. Technol. 2022, 33, 095020. [Google Scholar] [CrossRef]
Li, J.; Zhang, H.H.; Zhao, Y. In-flight calibration of the gyros of the Chang’E-3 lunar lander. Sci. China Technol. Sci. 2014, 44, 582–588. [Google Scholar]
Yu, D.; Dong, W.Q.; Wang, Y. Research and implementation of on-orbit self-calibration for gyroscope of circumlunar return and reentry spacecraft. Sci. China Technol. Sci. 2015, 45, 213–220. [Google Scholar] [CrossRef]
Pittelkau, M.E. Kalman filtering for spacecraft system alignment calibration. J. Guid. Control. Dyn. 2001, 24, 1187–1195. [Google Scholar] [CrossRef]
Lu, J.; Hu, M.; Yang, Y.; Dai, M. On-orbit calibration method for redundant IMU based on satellite navigation & star sensor information fusion. IEEE Sens. J. 2020, 20, 4530–4543. [Google Scholar]
Kain, J.E.; Cloutier, J.R. Rapid transfer alignment for tactical weapon applications. In Proceedings of the AIAA Guidance, Navigation and Control Conference, Boston, MA, USA, 14–16 August 1989. [Google Scholar]
Dui, X.; Yan, G.; Fu, Q.; Zhou, Q.; Liu, Z. A unified nonsingular rapid transfer alignment solution for tactical weapon based on matrix Kalman filter. IEEE Access 2018, 6, 78700–78709. [Google Scholar]
Zhou, D.; Guo, L. Rapid transfer alignment of an inertial navigation system using a marginal stochastic integration filter. Meas. Sci. Technol. 2018, 29, 015105. [Google Scholar] [CrossRef]
Lyu, W.; Cheng, X.; Wang, J. An improved adaptive compensation H∞ filtering method for the SINS’ transfer alignment under a complex dynamic environment. Sensors 2019, 19, 401. [Google Scholar] [CrossRef]
Wang, Y.; Xu, J.; Yang, B. A new polar rapid transfer alignment method based on grid frame for shipborne SINS. IEEE Sens. J. 2022, 22, 16150–16163. [Google Scholar] [CrossRef]
Xiang, Z.; Wang, Q.; Huang, R.; Xi, G.; Nie, X.; Zhou, J. Position observation-based calibration method for an LDV/SINS integrated navigation system. Appl. Opt. 2021, 60, 7869–7877. [Google Scholar] [CrossRef]
Ju, H.; Cho, S.Y.; Park, C.G. The effectiveness of acceleration matching according to the sensor performance in shipboard rapid transfer alignment. J. Navig. 2020, 73, 1–15. [Google Scholar] [CrossRef]
Rong, J.; Xu, L.; Zhang, H.; Cong, L. Augmentation method of XNAV in Mars orbit based on Phobos and Deimos observations. Adv. Space Res. 2016, 58, 1864–1878. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Zheng, W. Observability analysis of autonomous navigation using inter-satellite range: An orbital dynamics perspective. Acta Astronaut. 2020, 170, 577–585. [Google Scholar] [CrossRef]
Hu, J.; Liu, J.; Wang, Y.; Ning, X. INS/CNS/DNS/XNAV deep integrated navigation in a highly dynamic environment. Aircr. Eng. Aerosp. Technol. 2023, 95, 180–189. [Google Scholar] [CrossRef]
Zhang, L.; Wang, S.; Selezneva, M.S.; Neusypin, K.A. A new adaptive Kalman filter for navigation systems or carrier-based aircraft. Chin. J. Aeronaut. 2022, 35, 416–425. [Google Scholar] [CrossRef]
Gui, M.; Yang, H.; Ning, X.; Zhao, D.J.; Chen, L.; Dai, M.Z. Variational Bayesian implicit unscented Kalman filter for celestial navigation using time delay measurement. Adv. Space Res. 2023, 71, 756–767. [Google Scholar] [CrossRef]
Or, B.; Klein, I. A hybrid model and learning-based adaptive navigation filter. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Li, Y.; Yang, C.; Hou, Z.; Feng, Y.; Yin, C. Data-driven approximate Q-learning stabilization with optimality error bound analysis. Automatica 2019, 103, 435–442. [Google Scholar] [CrossRef]
Hu, Z.; Gong, W. Constrained evolutionary optimization based on reinforcement learning using the objective function and constraints. Knowl.-Based Syst. 2022, 237, 107731. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended Kalman filter. Asian J. Control. 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
Xiong, K.; Zhao, Q.; Yuan, L. Calibration method for relativistic navigation system using parallel Q-learning extended Kalman filter. Sensors 2024, 24, 6186. [Google Scholar] [CrossRef]
Lei, M.; Wyk, B.J.; Qi, Y. Online estimation of the approximate posterior Cramer-Rao lower bound for discrete-time nonlinear filtering. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 37–57. [Google Scholar] [CrossRef]
Ng, L.C.; Pines, D.J. Characterization of ring laser gyro performance using the Allan variance method. J. Guid. Control. Dyn. 1997, 20, 211–214. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: London, UK, 2018. [Google Scholar]
Tao, W.; Zhang, J.; Hu, H.; Zhang, J.; Sun, H.; Zeng, Z.; Song, J.; Wang, J. Intelligent navigation for the cruise phase of solar system boundary exploration based on Q-learning EKF. Complex Intell. Syst. 2024, 2, 2653–2672. [Google Scholar] [CrossRef]

Figure 1. Diagram of inter-spacecraft rapid transfer alignment.

Figure 2. Diagram of gyroscope error model.

Figure 3. Q-learning Kalman filtering framework.

Figure 4. Estimation error bound for

λ_{x k}

,

λ_{y k}

and

λ_{z k}

with different maneuver time.

Figure 5. Estimation error bound for

δ_{x y k}

,

δ_{x z k}

and

δ_{y x k}

with different maneuver time.

Figure 6. Estimation error bound for

δ_{y z k}

,

δ_{z x k}

and

δ_{z y k}

with different maneuver time.

Figure 7. Attitude estimation errors of different measurement matching schemes.

Figure 8. Scale factor estimation errors of different measurement matching schemes.

Figure 9. Misalignment estimation errors of different measurement matching schemes.

Figure 10. Attitude estimation error of transfer alignment based on QKF.

Figure 11. Scale factor estimation error of transfer alignment based on QKF.

Figure 12. Misalignment estimation error of transfer alignment based on QKF.

Figure 13. Attitude estimation errors of different filtering algorithms.

Figure 14. Scale factor estimation errors of different filtering algorithms.

Figure 15. Misalignment estimation errors of different filtering algorithms.

Figure 16. Attitude estimation errors of KF and QKF with different measurement noise.

Table 1. Average RMS error of state estimation error for transfer alignment.

Matching Scheme	Filtering Algorithm	Average RMS Error
Matching Scheme	Filtering Algorithm	Attitude (″)	Scale Factor (ppm)	Misalignment (″)
Attitude	KF	0.78	140.17	23.14
Attitude + angular rate	KF	0.42	113.40	13.91
Attitude + angular rate	QKF	0.31	69.80	6.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Inter-Spacecraft Rapid Transfer Alignment Based on Attitude Plus Angular Rate Matching Using Q-Learning Kalman Filter

Abstract

1. Introduction

2. Attitude Plus Angular Rate Matching Scheme

2.1. Main Idea

2.2. State Equation

2.3. Measurement Model

2.4. Transfer Alignment KF

3. Q-Learning Kalman Filter

3.1. Q-Learning Approach

3.2. Q-Learning Based Covariance Tuning

4. CRLB of Transfer Alignment System

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics