Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets

Li, Jiali; Tang, Shengjing; Guo, Jie

doi:10.3390/s22145389

Open AccessArticle

Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets

by

Jiali Li

,

Shengjing Tang

and

Jie Guo

^*

School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(14), 5389; https://doi.org/10.3390/s22145389

Submission received: 5 June 2022 / Revised: 9 July 2022 / Accepted: 18 July 2022 / Published: 19 July 2022

(This article belongs to the Section Physical Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

Although there have been numerous studies on maneuvering target tracking, few studies have focused on the distinction between unknown maneuvers and inaccurate measurements, leading to low accuracy, poor robustness, or even divergence. To this end, a noise-adaption extended Kalman filter is proposed to track maneuvering targets with multiple synchronous sensors. This filter avoids the simultaneous adjustment of the process model and measurement model without distinction. Instead, the maneuver detection based on the Dempster-Shafer evidence theory is constructed to achieve the reliable distinction between unknown maneuvers and inaccurate measurements by fusing multi-sensor information, which effectively improves the robustness of the filter. Moreover, the adaptive estimation of the process noise covariance is modeled by a Markovian decision process with a proper reward function. Deep deterministic policy gradient is designed to obtain the optimal process noise covariance by taking the innovation as the state and the compensation factor as the action. Furthermore, the recursive estimation of the measurement noise covariance is applied to modify a priori measurement noise covariance of the corresponding sensor. Finally, the fusion algorithm is developed for the global estimation. Simulation experiments are carried out in two scenarios, and simulation results illustrate the feasibility and superiority of the proposed algorithm.

Keywords:

maneuvering target; noise adaption; maneuver detection; Dempster-Shafer evidence theory; deep deterministic policy gradient

1. Introduction

With the development of high maneuvering targets, such as hypersonic aircrafts and missiles, maneuvering target tracking has recently drawn a lot of attention [1,2]. During the past few decades, lots of tracking methods have been developed and investigated for maneuvering targets [3,4,5]. Among the existing target tracking approaches, designing an appropriate model for the target maneuvering motion is regarded as an important part of decreasing the tracking errors caused by the model mismatch. Therefore, many maneuver models were proposed, such as constant acceleration (CA) [6], Singer model [7], current statistical model (CS) [8]. In contrast to single model methods, the interactive multiple model (IMM) algorithm [9,10] has been widely applied because of its adaptive capability for maneuvering target tracking problems. By mapping the motion mode to a model set, IMM performs the parallel filter for each model. Furthermore, based on the residual information and a priori information, the state estimation results of each model are weighted and synthesized to accurately achieve the maneuvering target tracking. Considering the limited performance of IMM based on the single platform, Wang et al. [11] proposed a multi-platform maneuvering target tracking algorithm based on IMM and the best linear unbiased estimate filter. Moreover, an adaptive interacting multiple model algorithm based on information-weighted consensus was proposed [12], which presented better adaptability and accuracy than the classical IMM. To cope with the miss detections and false alarms in a low-observable environment, a state-dependent IMM based on the particle filter was presented [13]. However, the above IMM algorithms usually use a fixed set of models, so a large number of model sets is required when solving the problem of maneuvering target tracking. To alleviate this problem, the variable structure multiple model (VSMM) method has been proposed [14]. Furthermore, a VSMM version of the consensus filter was presented [15], where the expected mode augmentation was introduced to guarantee the model set adaption. In addition, the particle swarm optimization technique was utilized to improve VSMM [16]. However, for maneuvering target tracking in practice, multiple models cannot fully cover the modes of the target’s motion, which may degrade the tracking performance. Additionally, excessive numbers of models also generate large calculations in many practical applications.

On the other hand, the filtering algorithm is regarded as an important issue to cope with maneuvering target tracking problems. Practically, extended Kalman filter (EKF) [17] has been widely applied for many nonlinear systems during the past few years. EKF is developed based on the Kalman filter and introduces the first-order Taylor expansion to approximate the nonlinear function. However, the first-order Taylor expansion of EKF cannot, at times, satisfy the tracking accuracy. To this end, other nonlinear filter algorithms including unscented Kalman filter (UKF) [18,19] and cubature Kalman filter (CKF) [20] were proposed. By adopting the unscented transformation and spherical-radial principle, UKF and CKF can achieve second-order accuracy without calculating the Jacobian matrix. In order to further improve the robustness of the conventional filters, the strong tracking filter (STF) was proposed by Zhou et al. [21,22]. This algorithm introduces the orthogonal principle of residual sequences into the classical EKF to modify the prediction of the covariance matrix with a time-varying fading factor. When the state estimation of EKF deviates from the actual state, the effect of the old observations on the current filter estimation is reduced using the time-variant fading factor whereas the effect of the new observation is increased [23]. Accordingly, the robustness and accuracy of the classical EKF were improved. In order to improve the performance of STF, UKF and CKF were introduced to replace EKF [24,25]. An adaptive strong tracking particle filter algorithm was established by combing STF and the particle filter (PF), which demonstrated that the proposed method could provide better tracking precision than the classical methods [26]. Since the traditional strong tracking filter only considers the first-order Taylor expansion, a fast-strong tracking CKF was proposed [27], which expanded the number of fading factors from one to two. Sequentially, it extracted the third-order term information and achieved second-order accuracy. To implement higher tracking accuracy for the maneuvering target tracking, many improved strong tracking cubature Kalman filter (STCKF), such as strong tracking spherical simplex-radial CKF [28], fifth-degree STCKF [29], Bayesian-based strong tracking interpolatory CKF [30], model-based strong tracking square-root CKF [31], have recently been proposed. In addition, Ma et al. [32] added STF into the sub-filter of IMM to overcome the low tracking accuracy when dealing with maneuvering situations. However, in practical applications, it is found that the detection and tracking ability of the STF method will be reduced when the maneuver of the target is small. As is known, the residuals of range measurements are usually much bigger than the residuals of azimuth angle and elevation angle measurements, which leads STF to different sensitivities of different residual components. To this end, Jiang et al. [33] proposed a residual-normalized strong tracking filter (RNSTF) by designing the weight of the residual components. However, the process noise is not considered.

The aforementioned algorithms adopt the innovation, also known as the predictive residual, to realize maneuver detection and compensation, because the innovation will increase significantly when the model mismatch is caused by the unknown maneuver. However, the sensor signal may be disturbed or blocked due to the influence of a harsh environment, making the statistical characteristics of measurement noise uncertain and time varying. Inaccurate a priori measurement noise covariance can also lead to increased innovation. In that case, the aforementioned algorithms are too sensitive to inaccurate measurements, leading to filter divergence. To solve the problem of inaccurate a priori knowledge, several adaptive Kalman filters (AKF) with noise sensing capability were established [34,35]. Unfortunately, the existing methods only can achieve the second-order estimation accuracy [36,37], and the adaptive KF cannot distinguish between model mismatch and inaccurate a priori measurement noise. In addition, adaptive KFs use the innovation to adjust the process noise and measurement noise simultaneously, which leads to poor stability.

Inspired by the strong representation capability of neural networks [38,39], several filters based on reinforcement learning have been studied. Hu et al. [40] designed an attitude estimator by combining Lyapunov’s method and the deep reinforcement learning algorithm. Tang et al. [41] combined the classic EKF with the deep reinforcement learning algorithm to realize the attitude estimation of the navigation system, which introduced a gain matrix of the residual and took it as the action to learn. Gao et al. [42] proposed an adaptive Kalman filter based on the deep deterministic policy gradient (DDPG) algorithm for ground vehicles, which took the integrated navigation system as the environment to obtain the process noise covariance matrix estimation. However, this adaptive filter takes the change in noise vector as the action to learn, resulting in poor filter stability. Additionally, the process noise covariance matrix of Inertial Navigation System (INS) error model will not experience a great change because it represents the inherent performance of the inertial sensor [42], which is different from the target tracking problem.

Motivated by the above investigation, a noise-adaption EKF is proposed in this paper to address the problem of maneuvering target tracking. Based on multiple synchronous sensors, the maneuver detection is constructed by utilizing Dempster-Shafer (D-S) evidence theory [43] to achieve the fused detection of multiple sensors. A Markovian decision process with a proper reward function is modeled for the adaptive estimation of the process noise covariance, and DDPG is designed to learn the compensation factor and feed it into EKF so that the improved filter can adaptively cope with the unknown maneuver. If the detection declares the occurrence of inaccurate measurement noise, the recursive measurement noise estimation is applied to the corresponding sensor. Finally, the local estimations are fused to obtain the target’s global estimation. When the unknown maneuver and mismatched measurement noise emerge, the proposed filter can correct noise covariance adaptively. Distinct from the aforementioned algorithms, the proposed filter avoids the simultaneous adjustment of process model and measurement model without distinction, which effectively improves the robustness of the filter. Moreover, the application the deep deterministic policy gradient method in solving maneuver target tracking problems is explored in this paper.

The remainder of this paper is organized as follows: Section 2 introduces mathematical models and the formula of EKF. Section 3 provides the maneuver detection based on D-S evidence theory in detail, and the framework of the noise-adaption EKF is presented. In Section 4, the process noise adaption based on DDPG is designed. Moreover, the recursive measurement noise estimation and the fusion algorithm are completed. Finally, simulation results and conclusions are shown in Section 5 and Section 6, respectively.

2. Problem Formulation

The process model of the target is given by

x_{k + 1} = f (x_{k}) + w_{k}

(1)

where

k

denotes the index of discrete-time.

x_{k}

represents the state vector, and

f (\cdot)

is the state transition function. The process noise

w_{k}

is assumed to be zero-mean Gaussian white noise with the covariance matrix

Q_{k}

.

The measurement model of sensors can be represented as

z_{k + 1} = h (x_{k + 1}) + v_{k + 1}

(2)

where

z_{k + 1}

is the measurement vector, and

h (\cdot)

is the measurement mapping function.

v_{k + 1}

is the measurement noise, whose covariance matrix is

R_{k + 1}

.

Since EKF has the advantages of simple algorithm and fast convergence, it has been widely applied in nonlinear systems [17]. Specifically, EKF can be divided into two stages. The first stage is the one-step prediction based on the process model

{\hat{x}}_{k + 1 | k} = f ({\hat{x}}_{k})

(3)

P_{k + 1 | k} = F_{k + 1 | k} P_{k | k} F_{k + 1 | k}^{T} + Q_{k}

(4)

where

{\hat{x}}_{k | k}

is the state estimation and

P_{k | k}

is the covariance matrix,

{\hat{x}}_{k + 1 | k}

and

P_{k + 1 | k}

are the corresponding predictions. The state transition matrix

F_{k + 1 | k}

is defined as the first-order Taylor expansion of the state transition function.

F_{k + 1 | k} = {\frac{\partial f (x)}{\partial x}|}_{x = {\hat{x}}_{k}}

(5)

The second stage is the one-step update based on the measurement. The state vector and its covariance matrix are updated as

K_{k + 1} = P_{k + 1 | k} H_{k + 1}^{T} {(H_{k + 1} P_{k + 1 | k} H_{k + 1}^{T} + R_{k + 1})}^{- 1}

(6)

{\hat{x}}_{k + 1} = {\hat{x}}_{k + 1 | k} + K_{k + 1} [z_{k + 1} - h ({\hat{x}}_{k + 1 | k})]

(7)

P_{k + 1 | k + 1} = (I - K_{k + 1} H_{k + 1}) P_{k + 1 | k}

(8)

where

K_{k + 1}

is the gain matrix, and

H_{k + 1}

is defined as

H_{k + 1} = {\frac{\partial h (x)}{\partial x}|}_{x = {\hat{x}}_{k + 1 | k}}

(9)

As the main prior information of the system, the process noise

Q

and the measurement noise

R

are of great significance to the estimation performance and stability of the filter.

Q

and

R

represent the confidence degree in models and measurements, respectively. If

Q

and

R

are smaller than the true noise distribution, the uncertainty range of the true state is too narrow, resulting in estimation bias. Conversely, if

Q

and

R

are larger than the true noise distribution, it may lead to filter divergence. Additionally, inaccurate

Q

and

R

can impair the estimation accuracy.

3. Maneuver Detection and the Framework

3.1. Maneuver Detection Based on D-S Evidence Theory

The key to maneuver detection is to design an appropriate detection strategy to detect the target’s maneuver precisely and timely. As stated in the introduction, the detection depending on the innovation from single-source information is not reliable. To improve the accuracy of the maneuver detection, the maneuver detection with a sliding-window structure based on D-S evidence theory is proposed in this section by introducing multiple synchronous sensors’ information.

D-S evidence theory is a fusion rule established on a nonempty finite space, which includes a limited number of subsets [43]. Through independent observations by sensors, it can fuse observation results and give a joint judgment to improve the confidence and accuracy of events. D-S evidence theory can combine the evidence more intuitively and easily, and it has the expression ability of unknown and uncertain situations by calculating the probability of the set of multiple events.

Consider all the subsets of the finite space

Θ

as

2^{Θ}

, including

Θ

and an empty set

Φ

, and define the map

m

:

2^{Θ} \to [0, 1]

, for

A \subset Θ

, it satisfies

\sum_{A \subset Θ} m (A) = 1

(10)

m (Φ) = 0

(11)

where the map

m (\cdot)

is called the basic probability assignment function.

m (A)

is called the mass function of

A

, which indicates the degree of confidence for

A

according to the current observation.

For the problem of maneuvering target tracking, the target has two movement modes: maneuver or not, so there are two events in the maneuver detection problem: “Maneuver” and “Normal”, denoted as

A_{1}

,

A_{2}

, respectively. However, when the degree of confidence tends to zero, unreasonable fusion results will be obtained, which is always called as Zadeh paradox [43]. To avoid the Zadeh paradox, one more event “uncertainty” is added, denoted as

A_{3} = \{A_{1}, A_{2}\}

, and its confidence degree is set to 0.01.

Suppose there are

N

sensors applied in the target’s measurement mission, denoted as

S_{i} (i = 1, 2, \dots, N)

, and the measurement of

S_{i}

at time

k

is

z_{i, k}

. The innovation is defined as the difference between actual measurement value and predictive measurement value.

γ_{i, k + 1} = z_{i, k + 1} - {\hat{z}}_{i, k + 1 | k}

(12)

where

{\hat{z}}_{i, k + 1 | k} = h_{i} ({\hat{x}}_{i, k + 1 | k})

is the predictive measurement. The innovation satisfies

E [γ_{i, k + 1}] = 0

(13)

E [γ_{i, k + 1} γ_{i, k + 1}^{T}] = H_{i, k + 1} P_{i, k + 1 | k} H_{i, k + 1}^{T} + R_{i, k + 1}

(14)

Following this, the detection variable is constructed based on the innovation.

η_{i, k + 1} = γ_{i, k + 1}^{T} {(H_{i, k + 1} P_{i, k + 1 | k} H_{i, k + 1}^{T} + R_{i, k + 1})}^{- 1} γ_{i, k + 1}

(15)

Since the innovation is subject to Gaussian distribution, the detection variable is supposed to be subject to

χ^{2}

distribution with

n

degree of freedom, where

n

is the dimension of the innovation vector. Therefore, the maneuver probability is defined as

P_{i, k + 1} (Maneuver) = \int_{0}^{η_{i, k + 1}} χ_{n}^{2} (x) d x

(16)

It can be seen from Equation (16) that, the innovation is considered to be entirely caused by the unknown maneuver, hypothetically. The increase in the innovation indicates bigger maneuver probability.

Based on the definition of the maneuver probability, the mass function of

S_{i}

is given as

m_{i, k + 1} (A_{1}) = P_{i, k + 1} (Maneuver)

(17)

m_{i, k + 1} (A_{2}) = 1 - P_{i, k + 1} (Maneuver) - 0.01

(18)

m_{i, k + 1} (A_{3}) = 0.01

(19)

As mentioned above, the mismatch between the real measurement noise and a priori measurement noise can also lead to the innovation’s increase. As a result, the maneuver probability determined by the innovation is unreliable. A joint detection mechanism is developed by fusing maneuver detection results of multiple synchronous sensors to improve the accuracy and reliability of the maneuver detection. For

N

independent pieces of evidence with

N_{e}

events, the fused mass function of

A_{1}

can be merged by D-S evidence theory as follows

\begin{matrix} m_{k + 1} (A_{1}) & = & \frac{\sum_{A_{1} \cap \dots \cap A_{N_{e}} = A_{1}} m_{1, k + 1} (A_{1}) \cdot m_{2, k + 1} (A_{2}) \dots m_{N, k + 1} (A_{N_{e}})}{1 - \sum_{A_{1} \cap \dots \cap A_{N_{e}} = Φ} m_{1, k + 1} (A_{1}) \cdot m_{2, k + 1} (A_{2}) \dots m_{N, k + 1} (A_{N_{e}})} \\ = & \frac{\sum_{A_{1} \cap \dots \cap A_{N_{e}} = A_{1}} m_{1, k + 1} (A_{1}) \cdot m_{2, k + 1} (A_{2}) \dots m_{N, k + 1} (A_{N_{e}})}{\sum_{A_{1} \cap \dots \cap A_{N_{e}} \neq Φ} m_{1, k + 1} (A_{1}) \cdot m_{2, k + 1} (A_{2}) \dots m_{N, k + 1} (A_{N_{e}})} \end{matrix}

(20)

where

N_{e} = 3

in this paper.

Similarly, the fused mass function of

A_{2}

and

A_{3}

can be acquired, denoted as

m_{k + 1} (A_{2})

and

m_{k + 1} (A_{3})

, respectively. If

m_{k + 1} (A_{1}) \geq m_{k + 1} (A_{2})

, it means that the target is detected to be maneuvering according to the fusion of multiple sensors at time

k

. Conversely, the target is not maneuvering when

m_{k + 1} (A_{1}) < m_{k + 1} (A_{2})

, and the mismatch of the a priori measurement noise is the main reason for the increased innovation.

However, the maneuver detection based on the current time is unstable. Furthermore, the sliding window structure is introduced into the maneuver detection to improve the stability of detection, where the detection depends not only on the current state but also on the neighboring states.

The size of the sliding window is set as

N_{s w}

, and the final mass function

m_{k + 1}^{f} (\cdot)

can be calculated by

m_{k + 1}^{f} (A_{1}) = \{\begin{matrix} \begin{matrix} \frac{\sum_{A_{1} \cap \dots \cap A_{N_{e}} = A_{1}} m_{k + 2 - N_{s w}} (A_{1}) \dots m_{k} (A_{N_{e}}) \cdot m_{k + 1} (A_{N_{e}})}{\sum_{A_{1} \cap \dots \cap A_{N_{e}} \neq Φ} m_{k + 2 - N_{s w}} (A_{1}) \dots m_{k} (A_{N_{e}}) \cdot m_{k + 1} (A_{N_{e}})} & k + 1 \geq N_{s w} \end{matrix} \\ \begin{matrix} \frac{\sum_{A_{1} \cap \dots \cap A_{N_{e}} = A_{1}} m_{1} (A_{1}) \dots m_{k} (A_{N_{e}}) \cdot m_{k + 1} (A_{N_{e}})}{\sum_{A_{1} \cap \dots \cap A_{N_{e}} \neq Φ} m_{1} (A_{1}) \dots m_{k} (A_{N_{e}}) \cdot m_{k + 1} (A_{N_{e}})} & k + 1 < N_{s w} \end{matrix} \end{matrix}

(21)

Similarly, the final mass function

m_{k + 1}^{f} (A_{2})

and

m_{k + 1}^{f} (A_{3})

can be obtained. According to the fused detection result, there are three conditions of the tracking problem. (a) When

m_{k + 1}^{f} (A_{1}) \geq m_{k + 1}^{f} (A_{2})

, the target is detected to be maneuvering, and the adaptive process noise estimation needs to be introduced to deal with unknown maneuvers. (b) If

m_{k + 1}^{f} (A_{1}) < m_{k + 1}^{f} (A_{2})

and

m_{i, k + 1} (A_{1}) \geq m_{i, k + 1} (A_{2})

, the target is not maneuvering, and the mismatch of measurement noise emerges, so the measurement noise adaption of

S_{i}

is necessary. (c) There are no above two conditions when

m_{k + 1}^{f} (A_{1}) < m_{k + 1}^{f} (A_{2})

and

m_{i, k + 1} (A_{1}) < m_{i, k + 1} (A_{2})

, so EKF can cope with the target tracking task.

3.2. The Framework of the Noise-Adaption EKF

Based on the above maneuver detection, a noise-adaption EKF is proposed to adaptively cope with the unknown maneuver and inaccurate measurement noise. The framework of the proposed filter is shown in Figure 1.

To achieve the accurate compensation for unknown maneuvers and the inaccurate measurement noise covariance, multi-sensor information undergoes the maneuver detection. Whether or not the process noise adaption or the measurement noise adaption proceeds is determined by the maneuver detection. If the target is detected to be maneuvering, the adaptive process noise is calculated by DDPG and introduced into the filter to cope with the model mismatch caused by the unknown maneuver. If the inaccurate measurement noise covariance is detected to be the main reason for the innovation’s increase, the recursive measurement noise estimation is applied to the corresponding sensor. Sequentially, the local estimations are fused to obtain the global estimation of the target. When the unknown maneuver or mismatched measurement noise emerge, the proposed filter can estimate the maneuvering target’s state with high performance.

4. Noise-Adaption EKF for Maneuvering Targets

4.1. Process Noise Adaption Based on DDPG

For the problem of maneuvering target tracking, the main task is to find a tracking strategy to estimate the target’s station rapidly and accurately. However, the accurate mathematical models of non-cooperative targets cannot be obtained. When the target is maneuvering, the estimation error increases due to the model mismatch. To cope with the unknown maneuver, an effective method is to adaptively estimate the process noise covariance online. To this end, a process noise adaption method based on DDPG is proposed in this section. Through its self-learning and optimization capabilities, DDPG algorithm can adaptively cope with the unknown maneuver.

DDPG is known as one of the reinforcement learning algorithms for the continuous action strategy. The interactive learning process of DDPG is formalized as the Markov decision process (MDP), which can be defined by four elements: the state

S = \{s_{1}, s_{2}, \dots\}

, the action

A = \{a_{1}, a_{2}, \dots\}

, the reward

R = \{r_{1}, r_{2}, \dots\}

, and the state transition probability

P (s_{k + 1} | s_{k}, a_{k})

. The framework of DDPG is shown in Figure 2. The agent generates an action to interact with the environment at first, and then under the joint effects of action and environment, a new state is generated, and the environment gives a reward to the critic. Afterwards, the critic calculates the action-value function to evaluate the given policy, then the action policy is further optimized according to the evaluation results.

The adaptive estimation of the process noise covariance is defined as

{\hat{Q}}_{k} = λ_{k} {\hat{Q}}_{k - 1}

(22)

where

{\hat{Q}}_{k}

is the adaptive process noise covariance, and its initial value is the a priori process noise covariance matrix.

λ_{k} > 1

is the compensation factor. This adaptive estimation form is more robust due to fewer parameters.

Correspondingly, the prediction of the covariance matrix in Equation (4) is updated as

P_{k + 1 | k} = F_{k + 1 | k} P_{k | k} F_{k + 1 | k}^{T} + {\hat{Q}}_{k}

(23)

Following by this, the problem of designing the maneuvering tracking strategy is transformed into the determination of the compensation factor, which can be described as an MDP. DPPG algorithm is designed to learn the compensation factor, so the action is defined as

a_{k} = λ_{k}

(24)

The innovation reflects the degree of model mismatch. The larger the innovation, the larger the model error, and the compensation factor needs to be increased accordingly. Therefore, the agent state is defined as the filter’s innovation in Equation (12), which contains three-dimensional innovation: range, azimuth angle and elevation angle, i.e.,

s_{k} = γ_{k}^{T} = {[γ_{k, r}, γ_{k, β}, γ_{k, ε}]}^{T}

(25)

When the environment, i.e., the filter, receives an action, a sufficient reward mechanism is required to evaluate the filter performance under the current action. Therefore, the filter residual is taken as the evaluation index, which is expressed as

φ_{k} = z_{k} - {\hat{z}}_{k} = z_{k} - h ({\hat{x}}_{k})

(26)

In practice, the residuals of range measurements are usually much bigger than the residuals of azimuth angle and elevation angle measurements in practice, which leads to different sensitivity to different residual components. To make the filter equally sensitive to residual elements, a weighting matrix

W

is introduced for normalizing the residuals.

W = [\begin{matrix} 1 / σ_{r} & 0 & 0 \\ 0 & 1 / σ_{β} & 0 \\ 0 & 0 & 1 / σ_{ε} \end{matrix}]

(27)

where

σ_{r}

,

σ_{β}

, and

σ_{ε}

represents the measurement errors’ standard deviation of range, azimuth angle, and elevation angle, respectively.

Therefore, the reward function is established as follows

r_{k} = t r [(W φ_{k}) {(W φ_{k})}^{T}]

(28)

where

t r [\cdot]

is the trace operation.

In DDPG algorithm, two neural networks including the actor network and the critic network, are utilized to approximate the action function and the action-value function. Denote

A^{μ} (s_{k})

as the action function, which is parameterized by

μ

. It maps from state to action, i.e.,

a_{k} = A^{μ} (s_{k})

. Moreover, the action-value function is denoted as

Q^{ω} (s_{k}, a_{k})

, which is parameterized by

ω

. Additionally, to enhance the convergence of training, one target actor network and one target critic network are introduced into DDPG, denoted as

A^{μ ’} (s_{k})

and

Q^{ω ’} (s_{k}, a_{k})

, respectively.

The optimal action policy is realized by maximizing the expected total reward

μ = \underset{μ}{\arg \max} J (A^{μ})

(29)

The expected total reward is defined as

J (A^{μ}) = E [Q^{ω} (s, a) |_{s = s_{k}, a = A^{μ} (s_{k})}]

(30)

Therefore, the actor function is optimized toward the gradient of the expected total reward

\nabla_{μ} J (A^{μ}) = \nabla_{μ} A^{μ} (s_{k}) \nabla_{a_{k}} Q^{ω} (s_{k}, a_{k})

(31)

Similarly, the action-value function is optimized by minimizing the loss function

ω = \underset{ω}{\arg \min} L (ω)

(32)

The temporal difference error

δ_{k}

is utilized to establish the loss function

L (ω) = δ_{k}^{2} = {[r_{k} + ξ Q^{ω ’} (s_{k + 1}, A^{μ ’} (s_{k + 1})) - Q^{ω} (s_{k}, A^{μ} (s_{k}))]}^{2}

(33)

where

r_{k}

is the reward, and

ξ

is the discounting factor.

The actor-value function is optimized toward the gradient of the loss function

\nabla_{ω} L (ω) = - 2 δ_{k} \nabla_{ω} Q^{ω} (s_{k}, a_{k})

(34)

To improve the stability in the DDPG training process, there is a replay memory buffer

ℛ

to store the training samples. At every training step, training samples with a size

N_{s a m p l e}

are utilized to train the actor and critic networks. The current transition experience

(s_{k}, a_{k}, r_{k}, s_{k + 1})

will be added into the buffer to replace the oldest one.

The actor network is updated by the Adam optimizer

μ_{k + 1} = f_{A d a m} (μ_{k}, \nabla_{μ} J (A^{μ}))

(35)

\nabla_{μ} J (A^{μ}) = \frac{1}{N_{s a m p l e}} \sum_{i = 1}^{N_{s a m p l e}} \nabla_{μ} A^{μ} (s_{i}) \nabla_{a_{k}} Q^{ω} (s_{i}, a_{i})

(36)

Similarly, the critic network is updated using the negative gradient of the loss function

ω_{k + 1} = f_{A d a m} (ω_{k}, \nabla_{ω} L (ω))

(37)

\nabla_{ω} L (ω) = \frac{1}{N_{s a m p l e}} \sum_{t = 1}^{N_{s a m p l e}} δ_{i} \nabla_{ω} Q^{ω} (s_{i}, a_{i})

(38)

At the end of each sample training, the two target networks are soft updated as

μ ’_{k + 1} = τ μ_{k + 1} + (1 - τ) μ ’_{k}

(39)

ω ’_{k + 1} = τ ω_{k + 1} + (1 - τ) ω ’_{k}

(40)

where

τ

is the updating rate.

The implementation framework of the adaptive estimation of process noise based on DDPG is shown in Figure 3.

The process noise adaption based on DDPG is summarized in Algorithm 1.

Algorithm 1 The process noise adaption based on DDPG

1 Initialize the parameters of the actor network and critic network

2 Initialize target networks by copying the actor and critic network

3 Initialize the replay memory buffer

ℛ

For each episode, perform the following steps

4 Initialize the estimation state and its covariance matrix

For each timestep, perform the following steps

5 Generate an action based on the actor network and the current state

a_{k} = A^{μ} (s_{k}) + 𝒩_{k}

, where the random noise is generated by Ornstein-Uhlenbeck process

6 Execute the action, i.e., the compensation factor in the filter to obtain a new state

s_{k + 1}

and a new reward

r_{k}

7 Store the sample

(s_{k}, a_{k}, s_{k + 1}, r_{k})

in the buffer

ℛ

8 Randomly select

N_{S a m p l e}

samples from the buffer

9 Calculate the temporal difference error

δ_{p}

of each sample

δ_{p} = r_{p} + ξ Q^{ω^{’}} (s_{p + 1}, A^{μ^{’}} (s_{p + 1})) - Q^{ω} (s_{p}, A^{μ} (s_{p}))

10 Calculate the policy gradient

\nabla_{μ} J (A^{μ}) = \frac{1}{N_{S a m p l e}} \sum_{p = 1}^{N_{S a m p l e}} \nabla_{μ} A^{μ} (s_{p}) \nabla_{a_{t}} Q^{ω} (s_{p}, a_{p})

11 Update the actor network by Adam optimizer:

μ_{k + 1} = f_{A d a m} (\nabla_{μ} J (A^{μ}))

12 Update the critic network

\nabla_{ω} ℒ (ω) = \frac{1}{N_{S a m p l e}} \sum_{p = 1}^{N_{S a m p l e}} - 2 δ_{p} \nabla_{ω} Q^{ω} (s_{p}, a_{p})

ω_{k + 1} = f_{A d a m} (ω_{t}, \nabla_{ω} ℒ (ω))

13 Update the two target networks by soft update

μ ’_{k + 1} = τ μ_{k + 1} + (1 - τ) μ ’_{k}

ω ’_{k + 1} = τ ω_{k + 1} + (1 - τ) ω ’_{k}

End timestep

End episode

4.2. Recursive Measurement Noise Estimation

When the sensor’s maneuver detection result is inconsistent with the fused maneuver detection result, the corresponding sensor’s measurement model is considered not to match a priori knowledge. In this case, the adaptive estimation method based on DDPG is no longer applicable to deal with it. Since the target is non-cooperative, the deviation between the process model and the actual model exists, so it is difficult to realize the optimal estimation of measurement noise in a similar way to the adaptive estimation of process noise. To modify the measurement noise adaptively, the recursive estimation of measurement noise is introduced [34].

The estimation of

{\hat{R}}_{k + 1}

can be obtained by maximizing the a posteriori density function, which is given by

{\hat{R}}_{k + 1} = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} (z_{j} - h ({\hat{x}}_{j})) {(z_{j} - h ({\hat{x}}_{j}))}^{T}

(41)

The residual can be represented as

\begin{matrix} z_{j} - h ({\hat{x}}_{j}) & = & z_{j} - h ({\hat{x}}_{j | j - 1} + K_{j} γ_{j}) \\ = & z_{j} - h ({\hat{x}}_{j | j - 1}) - H_{j} K_{j} γ_{j} + ο (K_{j} γ_{j}) \\ \approx & (I - H_{j} K_{j}) γ_{j} \end{matrix}

(42)

Substitute Equation (42) into Equation (41), yields

{\hat{R}}_{k + 1} = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} [(I - H_{j} K_{j}) γ_{j} γ_{j}^{T} {(I - H_{j} K_{j})}^{T}]

(43)

The expectation of the measurement noise can be expressed as (the detailed derivation can be found in the appendix of [34].)

\begin{matrix} E [{\hat{R}}_{k + 1}] & = & E \{\frac{1}{k + 1} \sum_{j = 1}^{k + 1} (I - H_{j} K_{j}) γ_{j} γ_{j}^{T} {(I - H_{j} K_{j})}^{T}\} \\ = & \frac{1}{k + 1} \sum_{j = 1}^{k + 1} [R_{j} - (I - H_{j} K_{j}) H_{j} P_{j | j - 1} H_{j}^{T}] \\ = & R - \frac{1}{k + 1} \sum_{j = 1}^{k + 1} [(I - H_{j} K_{j}) H_{j} P_{j | j - 1} H_{j}^{T}] \end{matrix}

(44)

The statistical residual covariance is calculated as the variance of historical residual sequence

{\bar{R}}_{k + 1} = (I - H_{k + 1} K_{k + 1}) [γ_{k + 1} γ_{k + 1}^{T} {(I - H_{k + 1} K_{k + 1})}^{T} + H_{k + 1} P_{k + 1 | k} H_{k + 1}^{T}]

(45)

Accordingly, to approximate the real measurement covariance, the modified measurement covariance is updated as the following recursive form

{\hat{R}}_{k + 1} = (1 - ρ_{k + 1}) {\hat{R}}_{k} + ρ_{k + 1} {\bar{R}}_{k + 1}

(46)

ρ_{k + 1} = \frac{1 - b}{1 - b^{k + 1}}

(47)

where

ρ_{k + 1}

is the weighting factor, and

b

is a fading factor. The initial value of

{\hat{R}}_{k}

is the a priori measurement noise covariance matrix.

Therefore, the gain matrix in Equation (6) is replaced as

K_{k + 1} = P_{k + 1 | k} H_{k + 1}^{T} {(H_{k + 1} P_{k + 1 | k} H_{k + 1}^{T} + {\hat{R}}_{k + 1})}^{- 1}

(48)

By modifying the measurement covariance, the gain matrix is adjusted, then the impact of inaccurate a priori measurement noise is reduced.

4.3. Fusion Algorithm

The local estimations from multiple sensors are fused in this section. Since the cross-covariance matrixes of different sensors are unknown, the fault-tolerant generalized convex combination algorithm (FGCC) [44] based on information theory is introduced in this section.

For two local estimations

{\hat{x}}_{1}

and

{\hat{x}}_{2}

, the fusion rule of FGCC is

{\hat{x}}_{f} = P_{f} (υ_{1} P_{1}^{- 1} {\hat{x}}_{1} + υ_{i} P_{2}^{- 1} {\hat{x}}_{2})

(49)

P_{f} = {(υ_{1} P_{1}^{- 1} + υ_{2} P_{2}^{- 1})}^{- 1}

(50)

υ_{1} + υ_{2} = δ

(51)

where

P_{1}

and

P_{2}

are corresponding covariance matrices,

υ_{1}

and

υ_{2}

are the weighting parameters, and the

δ

is an adaptive parameter constructed as

δ = \frac{H (P_{1}) + H (P_{2})}{H (P_{1}) + H (P_{2}) + I (P_{1}, P_{2})}

(52)

In the above equation,

ℋ

is the Shannon entropy, and

ℐ

is the symmetrized Kullback-Leibler distance between two distributions known as J-divergence.

H (P_{i}) = \frac{1}{2} \log [{(2 π)}^{n} | P_{i} |] + \frac{n}{2}

(53)

I (P_{i}, P_{j}) = 𝒟 (P_{i}, P_{j}) + 𝒟 (P_{j}, P_{i})

(54)

where

n

is the number of state vector dimensions, and

𝒟

is the Kullback-Leibler divergence.

D (P_{i}, P_{j}) = \frac{1}{2} [ln \frac{| P_{j} |}{| P_{i} |} + | | d_{x_{i j}} | |_{P_{j}^{- 1}} + t r (P_{i} P_{j}^{- 1}) - n]

(55)

where

d_{x_{i j}} = {\hat{x}}_{i} - {\hat{x}}_{j}

,and

i, j \in \{1, 2\}, i \neq j

.

The weights can be determined by the following equation

υ_{i} = \frac{δ 𝒟 (P_{i}, P_{j})}{𝒟 (P_{i}, P_{j}) + 𝒟 (P_{j}, P_{i})}

(56)

Obviously, the above algorithm can only be applied to the fusion of two local estimations. In order to improve the applicability of the fusion algorithm, an extended fault-tolerant generalized convex combination algorithm (EFGCC) is developed, which provides the general fusion rule for

N > 2

.

Suppose there are

N

local estimations

{\hat{x}}_{i} (i = 1, \dots, N)

and corresponding covariance matrices

P_{i}

. The fusion form of EFGCC is given as

{\hat{x}}_{f} = P_{f} \sum_{i = 1}^{N} υ_{i} P_{i}^{- 1} {\hat{x}}_{i}

(57)

P_{f} = {(\sum_{i = 1}^{N} υ_{i} P_{i}^{- 1})}^{- 1}

(58)

\sum_{i = 1}^{N} υ_{i} = δ

(59)

where the adaptive parameter

δ

is updated as

δ = \frac{\sum_{i = 1}^{N} ℋ (P_{i})}{\sum_{i = 1}^{N} ℋ (P_{i}) + \sum_{i = 1}^{N} \sum_{j = i + 1}^{N} ℐ (P_{i}, P_{j})}

(60)

The definitions of the Shannon entropy and the symmetrized Kullback-Leibler distance are the same as above. The weighting parameter

υ_{i}

can be determined by the following equation

υ_{i} = \frac{δ f_{i} (𝒟)}{\sum_{j = 1}^{N} f_{j} (𝒟)}

(61)

where

\begin{matrix} f_{i} (𝒟) = \{\begin{matrix} 𝒟 (P_{i}, P_{N}) \prod_{j = 1}^{i - 1} 𝒟 (P_{N}, P_{j}) \prod_{j = i + 1}^{N - 1} 𝒟 (P_{N}, P_{j}) & i < N \\ \prod_{j = 1}^{N - 1} 𝒟 (P_{N}, P_{j}) & i \geq N \end{matrix} \end{matrix}

(62)

By the fusion rule of Equation (57), the global estimation for maneuvering targets is realized.

5. Simulation

In this section, the effectiveness and superiority of the proposed tracking method are demonstrated through numerical simulations.

5.1. Target Model

In the simulation experiments, the Singer model [7] is adopted as the process model in this paper, which describes the target acceleration as the colored noise rather than the white noise. The acceleration is assumed to be a first-order time correlation. The model can be given by

x_{k + 1} = F_{k + 1 | k} x_{k} + w_{k}

(63)

The state vector

x_{k} = {[x_{k}, v_{x, k}, a_{x, k}, y_{k}, v_{y, k}, a_{y, k}, z_{k}, v_{z, k}, a_{z, k}]}^{T}

, which includes the position, velocity, and acceleration in three directions.

The state transition matrix is defined as

F_{k + 1 | k} = diag (F_{x}, F_{y}, F_{z})

(64)

where

F_{x} = F_{y} = F_{z} = [\begin{matrix} 1 & T & (e^{- α T} + α T - 1) / α^{2} \\ 0 & 1 & (1 - e^{- α T}) / α \\ 0 & 0 & e^{- α T} \end{matrix}]

(65)

α

is the target’s maneuvering frequency, and

T

is the sampling period.

The covariance matrix is calculated as

Q_{k} = diag (Q_{x}, Q_{y}, Q_{z})

(66)

where

Q_{x} = Q_{y} = Q_{z} = 2 α σ_{a}^{2} {[q_{i j}]}_{3 \times 3}

(67)

\{\begin{cases} q_{11} = \frac{1}{2 α^{5}} [1 - e^{- 2 α T} + 2 α T + 2 α^{3} T^{3} / 3 - 2 α^{2} T^{2} - 4 α T e^{- α T}] \\ q_{12} = q_{21} = \frac{1}{2 α^{4}} [e^{- 2 α T} + 1 - 2 e^{- α T} + 2 α T e^{- α T} - 2 α T + α^{2} T^{2}] \\ q_{13} = q_{31} = \frac{1}{2 α^{3}} [1 - e^{- 2 α T} - 2 α T e^{- α T}] \\ q_{22} = \frac{1}{2 α^{3}} [4 e^{- α T} - 3 - e^{- 2 α T} + 2 α T] \\ q_{23} = q_{32} = \frac{1}{2 α^{2}} [e^{- 2 α T} + 1 - 2 e^{- α T}] \\ q_{33} = \frac{1}{2 α} [1 - e^{- 2 α T}] \end{cases}

(68)

The measurement vector is defined as

z_{k} = [r_{k}, β_{k}, ε_{k}]^{T}

, where

r_{k}

is the range,

β_{k}

is the azimuth angle, and

ε_{k}

is the elevation angle. The measurement mapping function is

h (x_{k + 1}) = [\begin{array}{l} \sqrt{x_{k + 1}^{2} + y_{k + 1}^{2} + z_{k + 1}^{2}} \\ \arctan (z_{k + 1} / x_{k + 1}) \\ \arctan (y_{k + 1} / \sqrt{x_{k + 1}^{2} + z_{k + 1}^{2}}) \end{array}]

(69)

The initial covariance matrix of the measurement noise is

R = [\begin{matrix} σ_{r}^{2} & 0 & 0 \\ 0 & σ_{β}^{2} & 0 \\ 0 & 0 & σ_{ε}^{2} \end{matrix}]

(70)

5.2. Construction of Neural Networks

In the proposed process noise estimation based on DDPG, the actor network is used to learn the action policy from the input of the innovation

{[γ_{k, r}, γ_{k, β}, γ_{k, ε}]}^{T}

to the output of the compensation factor

λ_{k}

, and the critic network is used to obtain the action-value function based on the innovation and the compensation factor. Inspired by [45,46], four-layer fully connected networks are commonly utilized in deep reinforcement learning applications. Hence the structure of the actor network is designed as shown in Figure 4, and the critic network is represented by the same network structure. Too few neurons will lead to underfitting. Conversely, too many neurons can lead to overfitting and increased training time. By referring to [42,45,46] and simulation experiments, the size of neural networks are set as in Table 1.

The activation function is

\tan h

function, which is given by

\tan h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(71)

Simulation parameters are summarized in Table 2.

The simulation environment is defined as

6000 km \times 100 km \times 6000 km

, in which the target tracking task is carried out. The training task includes 500 random episodes, and each episode runs 1000 timesteps. The maximum acceleration of the target is 30 m/s², and the minimum acceleration is −30 m/s². For each episode, the target randomly maneuvers with values uniformly distributed between the minimum and the maximum accelerations. According to the definition of the training environment and the maneuver space, training scenarios are constructed.

5.3. Simulation Results

To demonstrate the superiority of the proposed filter, IMM [9], STF [22], RNSTF [33], and AKF [34] are simulated for comparison. Two simulation scenarios are conducted in this section to illustrate the feasibility of the proposed noise-adaption EKF. In the first scenario, the target makes continuous time-varying maneuvers. In the second scenario, the target is set to make abrupt maneuvers to verify the performance against strong maneuvers.

In the simulation, the fading factor in the recursive measurement noise estimation and AKF

b = 0.9

, and the maneuvering frequency in the Singer model

α = 1 / 20

. The forgetting factor in STF and RNSTF is 0.95. CA model, Singer model, and CS model are adopted in IMM, and the maneuvering frequency in the CS model

α = 1 / 20

. The initial probability of each sub model is

[\begin{matrix} 0.4 & 0.4 & 0.2 \end{matrix}]

, and the Markov transition probability matrix is set as

P = [\begin{matrix} 0.95 & 0.025 & 0.025 \\ 0.025 & 0.95 & 0.025 \\ 0.025 & 0.025 & 0.95 \end{matrix}]

(72)

The root mean square error (RMSE) is generally used to estimate the tracking accuracy, given as

RMSE = \sqrt{\sum_{j = 1}^{N_{M C}} {({\hat{x}}_{k}^{j} - x_{k}^{j})}^{2} / N_{M C}}

(73)

where

{\hat{x}}_{k}

is the estimation value, and

x_{k}

is the real value.

N_{M C}

is the number of Monte Carlo experiments, which is 100 in the following simulations.

5.3.1. Continuous Time Varying Maneuver

In this scenario, the target’s acceleration is continuous time-varying in sinusoidal form, which can be described as

a = A \sin (ω t + φ_{0})

(74)

where

A

is the amplitude,

ω

is the frequency, and

φ_{0}

is the initial phase. The entire flight process lasts 1000 s. The initial position is [800 km, 80 km, 800 km], and the initial velocity is [−300 m/s, 10 m/s, −300 m/s]. The sinusoidal parameters are set as shown in Table 3.

Assuming that there are three sensors at the origin to track the maneuvering target with a sampling period of 1 s, and a priori measurement errors are set as shown in Table 4. Furthermore, to verify the performance of the proposed noise-adaption EKF under the condition of inaccurate measurement noise, the magnitude of the range measurement noise of

S_{1}

enlarged five times during 500–600 s, which is unknown to the filter.

The algorithms mentioned above are utilized for tracking solution, and the same fusion method, i.e., EFGCC, is adopted to obtain fusion trajectories. For convenience, the proposed noise-adaption EKF is denoted as NAEKF in the following simulation results. The estimated trajectories are shown in Figure 5.

The position estimation errors are presented in Figure 6. In order to correspond to the measurements, the position estimation errors are given in terms of range, azimuth angle and elevation angle. The position errors of NAEKF are smaller than that of other filters, especially in the stage of inaccurate measurement noise. Correspondingly, Figure 7 shows the

σ

-boundaries of position estimation errors. At the beginning of the simulation, position errors are generally large due to initialization errors, but the estimation accuracy gradually improves over time. After 100 s, the range error of the proposed filter converges to 3.27 m. The azimuth angle error reaches 0.0033°, and the elevation angle error is stable around 0.0029°. Due to the effect of inaccurate measurement noise between 500–600 s, the stable value of range errors reaches 3.54 m. Similarly, the azimuth angle error increases slightly and finally stabilizes at 0.0043°, and the elevation angle error stabilizes at 0.0043°. To compare local estimation accuracy of different algorithms, the detailed estimation errors are shown in Table 5, Table 6 and Table 7. Taking the range estimation of

S_{1}

as an example, the estimation accuracy of NAEKF is 8.95%, 21.28% and 20.90% higher than that of IMM, STF and RNSTF, respectively, and the estimation error of AKF is almost seven times that of NAEKF. Obviously, the proposed filter has better tracking accuracy than other filters.

Figure 8 and Figure 9 show the velocity estimation errors and the corresponding

σ

-Boundary in three directions, respectively. As shown in simulation results, the velocity estimation accuracy of IMM, STF, RNSTF and AKF are worse than that of NAEKF. It can be seen from Figure 9 that the velocity estimation errors of STF and RNSTF increase significantly when inaccurate measurement noise occurs, while the result of IMM and NAEKF do not. Taking the velocity estimation in x direction as an example, the estimation error of STF increases from 24.85 m/s to 35.20 m/s, and that of RNSTF increases from 24.87 m/s to 35.27 m/s. The estimation error of IMM is stable around 18.20 m/s, while the estimation error of AKF reaches 98.65 m/s. In contrast, the velocity estimation error in x direction of NAEKF reaches 13.83 m/s. The tracking results in y and z directions are similar as that in x direction, which will not be repeated here. In the end, the velocity estimation accuracy of NAEKF reaches 14.21 m/s, which is the average of three directions.

The acceleration estimation errors and the corresponding

σ

-boundaries of acceleration estimation errors are presented in Figure 10 and Figure 11, separately. The average acceleration estimation accuracy of NAEKF is 1.27

m / s^{2}

, whereas that of IMM, STF, RNSTF and AKF is around 1.86

m / s^{2}

, 4.20

m / s^{2}

, 4.11

m / s^{2}

and 28.01

m / s^{2}

, respectively. It can be confirmed that the proposed NAEKF can effectively deal with unknown sinusoidal maneuvers and achieves better tracking performance than IMM, STF, RNSTF as well as AKF. Table 8 provides computing times of five algorithms. Since NAEKF adds steps such as maneuver detection, the computing time is slightly longer than that of STF and RNSTF. However, NAEKF takes less computing time than IMM and AKF, because the parallel calculation of sub filters in IMM and the simultaneous noise estimation in AKF are time-consuming.

On the basis of the above experiment settings, the magnitude of the angle measurement noises of

S_{1}

also enlarged five times during 500–600 s. Simulation results show that the matrix singularity problem of AKF occurs in this simulation experiment. As described in the introduction, AKF shows poor stability, because it updates the process noise and measurement noise simultaneously without distinction. Therefore, IMM, STF, RNSTF and NAEKF are presented in this experiment.

Figure 12 and Figure 13 show the position estimation errors and the corresponding

σ

-Boundary. Although the inaccurate measurement noises lead to certain increases in estimation errors, NAEKF shows better estimation accuracy and robustness compared to other three algorithms. Taking the azimuth angle as an example, the estimation error of NAEKF is 0.0068°, while that of IMM reaches 0.0098°. Additionally, the estimation error of STF and RNSTF is 0.0159° and 0.0158°, respectively, which is more than twice the estimation error of NAEKF.

The estimation results of the velocity and acceleration are shown in Figure 14, Figure 15, Figure 16 and Figure 17. It is obvious that the proposed NAEKF can effectively deal with inaccurate measurement noises in this experiment and achieves the best estimation accuracy compared to other algorithms. The velocity estimation errors of IMM, STF and RNSTF are several times that of NAEKF, and the same conclusion can be drawn from the acceleration estimation results.

5.3.2. Abrupt Maneuver

In this scenario, the target’s acceleration is set to be abrupt, and the maneuver parameters are shown in Table 9. The initial position is [6000 km, 15 km, 6000 km], and the initial velocity is [−1000 m/s, 0 m/s, −1000 m/s].

The measurement errors are the same as in Table 4, and the magnitude of the range measurement noise of

S_{1}

enlarged five times during 700–800 s. It should be noted that the matrix singularity problem of AKF occurs in this simulation scenario. Therefore, IMM, STF, RNSTF and NAEKF are presented in this section. The tracking results are shown in Figure 18.

The position estimation errors and the corresponding

σ

-boundaries are shown in Figure 19 and Figure 20, respectively. Simulation results show that the proposed filter can track the abrupt maneuver target effectively. Owing to the D-S maneuver detection, the proposed noise-adaption EKF can detect unknown maneuvers and inaccurate measurement noise simultaneously. As a result, the proposed filter can utilize the process noise adaption based on DDPG to adaptively cope with the unknown maneuver. It can be seen from Figure 19a that abrupt maneuvers cause several error increases in the tracking process, but they are quickly eliminated. When inaccurate measurement noise is detected, the proposed filter utilizes the recursive measurement noise adaption to weaken its influence. Certain increases in estimation errors are caused by inaccurate measurement noise during 700–800 s. The estimation errors of STF and RNSTF increase remarkably, especially when the inaccurate measurement noise appears. The estimation errors of different algorithms are shown in Table 10, Table 11 and Table 12. It is evident that the proposed filter achieves higher estimation accuracy than IMM, STF and RNSTF in both local estimation and fusion estimation.

Figure 21 and Figure 22 present velocity estimation errors and corresponding

σ

-boundaries. Velocity estimation errors in x and z directions of NAEKF are 77.32 m/s and 77.86 m/s, respectively. However, the estimation error in y direction reaches 146.70 m/s, because the maneuver in y direction is much bigger than other directions. Estimation errors of STF and RNSTF are significantly bigger than that of NAEKF. In contrast, IMM achieves better estimation accuracy than STF and RNSTF, but still worse than NAEKF.

According to acceleration estimation results in Figure 23, there are several peaks of acceleration tracking errors during abrupt maneuvers, but they decrease rapidly. The

σ

-boundaries of acceleration estimation errors are shown in Figure 24. The acceleration estimation accuracy in x direction of NAEKF is 4.97

m / s^{2}

, whereas that of IMM, STF and RNSTF is finally stabilized at 10.23

m / s^{2}

, 12.31

m / s^{2}

and 12.49

m / s^{2}

, respectively. Similarly, the acceleration estimation accuracy in z direction of NAEKF is 5.06

m / s^{2}

, and the estimation accuracy of IMM, STF and RNSTF is around 10.38

m / s^{2}

, 12.02

m / s^{2}

and 12.24

m / s^{2}

, respectively. Due to the greater maneuver, the acceleration estimation accuracy in y direction is worse than that in other directions. Specifically, the acceleration estimation error of NAEKF reaches 10.27

m / s^{2}

. In contrast, the acceleration estimation error of IMM, STF and RNSTF is around 14.66

m / s^{2}

, 18.93

m / s^{2}

, and 18.32

m / s^{2}

, respectively. Table 13 provides computing times of four algorithms. Similarly to the first scenario, the computing time of NAEKF is shorter than that of IMM, but slightly longer than that of STF and RNSTF.

6. Conclusions

To address the problem of maneuvering target tracking with inaccurate measurements, the noise-adaption EKF based on DDPG is proposed in this paper. The proposed filter avoids the simultaneous adjustment of the process model and the measurement model without distinction, which effectively improves the robustness of the filter. Within the framework of the noise-adaption EKF, the maneuver detection based on D-S evidence theory is constructed to distinguish between the unknown maneuver and inaccurate measurement noise simultaneously by fusing multi-sensor information. Moreover, a Markovian decision process of maneuver tracking is established to cope with unknown maneuvers. DDPG is developed to learn the adaptive estimation of the compensation factor and feed it to EKF. In addition, the recursive measurement noise estimation is applied to estimate a priori measurement noise covariance online. The local estimations are fused at last, achieving the global estimation of multiple sensors. The simulation results indicate that the proposed noise-adaption EKF is effective in both scenarios of continuous time-varying maneuver and the abrupt maneuver. As shown in simulation results, the proposed tracking method has a better tracking performance compared to IMM, STF, RNSTF and AKF.

Author Contributions

Conceptualization, S.T.; methodology, J.L. and J.G.; software, J.L.; validation, J.L. and J.G.; formal analysis, S.T.; investigation, J.G.; resources, S.T.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.G.; visualization, J.G.; supervision, S.T.; project administration, J.G.; funding acquisition, J.G. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 11572036.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors give their sincere thanks to the editors and the anonymous reviewers for their constructive comments of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yin, J.; Yang, Z.; Luo, Y. Adaptive tracking method for non-cooperative continuously thrusting spacecraft. Aerospace 2021, 8, 244. [Google Scholar] [CrossRef]
Wang, S.; Bi, D.; Ruan, H.; Du, M. Radar maneuvering target tracking algorithm based on human cognition mechanism. Chin. J. Aeronaut. 2019, 32, 1695–1704. [Google Scholar] [CrossRef]
Lim, J.; Kim, H.; Park, H. Minimax particle filtering for tracking a highly maneuvering target. Int. J. Robust Nonlinear Control 2020, 30, 636–651. [Google Scholar] [CrossRef]
Cheng, Y.; Yan, X.; Tang, S.; Wu, M.; Li, C. An adaptive non-zero mean damping model for trajectory tracking of hypersonic glide vehicles. Aerosp. Sci. Technol. 2021, 111, 106529. [Google Scholar] [CrossRef]
Liu, J.; Wang, Z.; Xu, M. DeepMTT: A deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network. Inf. Fusion 2020, 53, 289–304. [Google Scholar] [CrossRef]
Zhu, W.; Xu, Z.; Li, B.; Wu, Z. Research on the observability of bearings-only target tracking based on multiple sonar sensors. In Proceedings of the 2012 Second International Conference on Instrumentation, Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 8–10 December 2012. [Google Scholar]
Jia, S.; Zhang, Y.; Wang, G. Highly maneuvering target tracking using multi-parameter fusion Singer model. J. Syst. Eng. Electron. 2017, 28, 841–850. [Google Scholar] [CrossRef]
Jiang, B.; Sheng, W.; Zhang, R.; Han, Y.; Ma, X. Range tracking method based on adaptive “current” statistical model with velocity prediction. Signal Process. 2017, 131, 261–270. [Google Scholar] [CrossRef]
Blom, H.; Bar-Shalom, Y. The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE Trans. Autom. Control 1988, 33, 780–783. [Google Scholar] [CrossRef]
Lim, J.; Kim, H.; Park, H. Interactive-multiple-model algorithm based on minimax particle filtering. IEEE Signal Process. Lett. 2020, 27, 36–40. [Google Scholar] [CrossRef]
Wang, M.; Bai, Y. Multi-platform maneuvering target tracking based on BLUE-AIMM-CI algorithm. In Proceedings of the 34th Chinese Control Conference, Hangzhou, China, 28–30 July 2015. [Google Scholar]
Ding, Z.; Liu, Y.; Liu, J.; Yu, K.; You, Y.; Jing, P.; He, Y. Adaptive interacting multiple model algorithm based on information-weighted consensus for maneuvering target tracking. Sensors 2018, 18, 2012. [Google Scholar] [CrossRef] [Green Version]
Yu, M.; Chen, W.; Chambers, J. State dependent multiple model-based particle filtering for ballistic missile tracking in a low-observable environment. Aerosp. Sci. Technol. 2017, 67, 144–154. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Jilkov, V.; Ru, J. Multiple-model estimation with variable structure part VI: Expected-mode augmentation. IEEE Trans. Aerosp. Electron. Syst. 2005, 41, 853–867. [Google Scholar]
Wang, J.; Dong, P.; Jing, Z.; Cheng, J. Consensus variable structure multiple model filtering for distributed maneuvering tracking. Signal Process. 2019, 162, 234–241. [Google Scholar] [CrossRef]
Yu, M.; Oh, H.; Chen, W. An improved multiple model particle filtering approach for maneuvering target tracking using airborne GMTI with geographic information. Aerosp. Sci. Technol. 2016, 52, 62–69. [Google Scholar] [CrossRef] [Green Version]
Gelb, A. Applied Optimal Estimation; MIT Press: Cambridge MA, USA, 1974. [Google Scholar]
Kim, J.; Lee, K. Unscented Kalman filter-aided long short-term memory approach for wind nowcasting. Aerospace 2021, 8, 236. [Google Scholar] [CrossRef]
Wan, E.A.; van der Merwe, R. The unscented Kalman filter for nonlinear estimation. In Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), Lake Louise, AB, Canada, 1–4 October 2000. [Google Scholar]
Zhong, Z.; Zhao, E.; Zheng, X.; Zhao, X. A consensus-based square-root cubature Kalman filter for maneuvering target tracking in sensor networks. Trans. Inst. Meas. Control 2020, 42, 3052–3062. [Google Scholar] [CrossRef]
Zhou, D.; Xi, Y.; Zhang, Z. Suboptimal fading extended Kalman filtering for nonlinear systems. Control Decis. 1990, 5, 1–6. [Google Scholar]
Zhou, D.; Xi, Y.; Zhang, Z. A suboptimal multiple fading extended Kalman filter. Acta Autom. Sin. 1991, 17, 689–695. [Google Scholar]
Liu, Q.; Huang, C.; Li, P. Distributed consensus strong tracking filter for wireless sensor networks with model mismatches. Int. J. Distrib. Sens. Netw. 2017, 13, 1–10. [Google Scholar] [CrossRef] [Green Version]
Hu, G.; Gao, D.; Zhong, Y.; Subic, A. Modified strong tracking unscented Kalman filter for nonlinear state estimation with process model uncertainty. Int. J. Adapt. Control Signal Process. 2015, 29, 1561–1577. [Google Scholar] [CrossRef]
Xia, B.; Wang, H.; Wang, M.; Sun, W.; Xu, Z.; Lai, Y. A new method for state of charge estimation of lithium-ion battery based on strong tracking cubature Kalman filter. Energies 2015, 8, 13458–13472. [Google Scholar] [CrossRef] [Green Version]
Li, J.; Zhao, R.; Chen, J.; Zhao, C.; Zhu, Y. Target tracking algorithm based on adaptive strong tracking particle filter. IET Sci. Meas. Technol. 2016, 10, 704–710. [Google Scholar]
Zhang, A.; Bao, S.; Gao, F.; Bi, W. A novel strong tracking cubature Kalman filter and its application in maneuvering target tracking. Chin. J. Aeronaut. 2019, 32, 2489–2502. [Google Scholar] [CrossRef]
Liu, H.; Wu, W. Strong tracking spherical simplex-radial cubature Kalman filter for maneuvering target tracking. Sensors 2017, 17, 741. [Google Scholar] [CrossRef] [Green Version]
Li, Z.; Yang, W.; Ding, D. A novel fifth-degree strong tracking cubature Kalman filter for two-dimensional maneuvering target tracking. Math. Probl. Eng. 2018, 2018, 5918456. [Google Scholar] [CrossRef]
Wang, J.; Zhang, T.; Xu, X.; Li, Y. A variational Bayesian based strong tracking interpolatory cubature Kalman filter for maneuvering target tracking. IEEE Access 2018, 6, 52544–52560. [Google Scholar] [CrossRef]
Zhang, H.; Xie, J.; Ge, J.; Lu, W.; Liu, B. Strong tracking SCKF based on adaptive CS model for maneuvering aircraft tracking. IET Radar Sonar Navig. 2018, 12, 742–749. [Google Scholar] [CrossRef]
Ma, J.; Guo, X. Combination of IMM algorithm and ASTRWCKF for maneuvering target tracking. IEEE Access 2020, 8, 143095–143103. [Google Scholar] [CrossRef]
Jiang, Y.; Ma, P.; Baoyin, H. Residual-normalized strong tracking filter for tracking a noncooperative maneuvering spacecraft. J. Guid. Control Dyn. 2019, 42, 2304–2309. [Google Scholar] [CrossRef]
Gao, W.; Li, J.; Zhou, G.; Li, Q. Adaptive Kalman filtering with recursive noise estimator for integrated SINS/DVL systems. J. Navig. 2015, 68, 142–161. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Y.; Wu, Z.; Li, N.; Chambers, J. A novel adaptive Kalman filter with inaccurate process and measurement noise covariance matrices. IEEE Trans. Autom. Control 2018, 63, 594–601. [Google Scholar] [CrossRef] [Green Version]
Ge, B.; Zhang, H.; Jiang, L.; Li, Z.; Butt, M. Adaptive unscented Kalman filter for target tracking with unknown time-varying noise covariance. Sensors 2019, 19, 1371. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fang, X.; Chen, L. Noise-aware maneuvering target tracking algorithm in wireless sensor networks by a novel adaptive cubature Kalman filter. IET Radar Sonar Navig. 2020, 14, 1795–1802. [Google Scholar] [CrossRef]
Fang, W.; Jiang, J.; Lu, S.; Gong, Y.; Tao, Y.; Tang, Y.; Yan, P.; Luo, H.; Liu, J. A LSTM algorithm estimating pseudo measurements for aiding INS during GNSS signal outages. Remote Sens. 2020, 12, 256. [Google Scholar] [CrossRef] [Green Version]
Wu, F.; Luo, H.; Jia, H.; Zhao, F.; Xiao, Y.; Gao, X. Predicting the noise covariance with a multitask learning model for Kalman filter-based GNSS/INS integrated navigation. IEEE Trans. Instrum. Meas. 2021, 70, 8500613. [Google Scholar] [CrossRef]
Hu, L.; Tang, Y.; Zhou, Z.; Pan, W. Reinforcement learning for orientation estimation using inertial sensors with performance guarantee. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Tang, Y.; Hu, L.; Zhang, Q.; Pan, W. Reinforcement learning compensated extended Kalman filter for attitude estimation. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), New York, NY, USA, 27 September–1 October 2021. [Google Scholar]
Gao, X.; Luo, H.; Ning, B.; Zhao, F.; Bao, L.; Gong, Y.; Xiao, Y.; Jiang, J. RL-AKF: An adaptive Kalman filter navigation algorithm based on reinforcement learning for ground vehicles. Remote Sens. 2020, 12, 1704. [Google Scholar] [CrossRef]
Liu, Y.; Wang, X.; Liu, K. Network anomaly detection system with optimized DS Evidence Theory. Sci. World J. 2014, 753659, 1–14. [Google Scholar] [CrossRef]
Wang, Y.; Li, X. Distributed estimation fusion with unavailable cross-correlation. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 259–278. [Google Scholar] [CrossRef]
Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep reinforcement learning that matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Li, B.; Gan, Z.; Chen, D.; Sergey Aleksandrovich, D. UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sens. 2020, 12, 3789. [Google Scholar] [CrossRef]

Figure 1. The framework of the noise-adaption EKF.

Figure 2. The basic framework of DDPG.

Figure 3. The implementation framework of the process noise adaption based on DDPG.

Figure 4. The structure of actor network.

Figure 5. Estimated trajectories.

Figure 6. Position estimations errors. (a) Estimation errors of the range; (b) Estimation errors of the azimuth angle; (c) Estimation errors of THE elevation angle.

Figure 7.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 7.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 8. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 8. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 9.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 9.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 10. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 10. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 11.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Figure 11.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Figure 12. Position estimations errors. (a) Estimation errors of the range; (b) Estimation errors of the azimuth angle; (c) Estimation errors of the elevation angle.

Figure 13.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 13.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 14. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 14. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 15.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 15.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 16. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 16. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 17.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Figure 17.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Figure 18. Estimated trajectories.

Figure 19. Position estimation errors. (a) Estimation errors of the range; (b) Estimation errors of the azimuth angle; (c) Estimation errors of the elevation angle.

Figure 20.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 20.

σ

-Boundary of position estimation errors. (a)

σ

-Boundary of the range; (b)

σ

-Boundary of the azimuth angle; (c)

σ

-Boundary of the elevation angle.

Figure 21. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 21. Velocity estimation errors. (a) Estimation errors of

v_{x}

; (b) Estimation errors of

v_{y}

; (c) Estimation errors of

v_{z}

.

Figure 22.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 22.

σ

-Boundary of velocity estimation errors. (a)

σ

-Boundary of

v_{x}

; (b)

σ

-Boundary of

v_{y}

; (c)

σ

-Boundary of

v_{z}

.

Figure 23. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 23. Acceleration estimation errors. (a) Estimation errors of

a_{x}

; (b) Estimation errors of

a_{y}

; (c) Estimation errors of

a_{z}

.

Figure 24.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Figure 24.

σ

-Boundary of acceleration estimation errors. (a)

σ

-Boundary of

a_{x}

; (b)

σ

-Boundary of

a_{y}

; (c)

σ

-Boundary of

a_{z}

.

Table 1. Neural network settings.

Layers	Actor Network	Critic Network
Input layer	3	4
Hidden layer 1	128	128
Hidden layer 2	128	128
Output layer	1	1

Table 2. Simulation parameters of DDPG.

Parameters	Value
The discounting factor $ξ$	0.99
The updating rate $τ$	0.001
The number of samples $N_{s a m p l e}$	64
The number of episodes	500
The number of timesteps	1000

Table 3. Sinusoidal parameters.

Directions	$x$	$y$	$z$
$A$	1	2	1
$ω$	0.005	0.01	0.005
$φ_{0}$	30	30	30

Table 4. Measurement accuracy of multiple sensors.

Sensors	$σ_{r} (m)$	$σ_{β} (^{o})$	$σ_{ε} (^{o})$
$S_{1}$	10	0.01	0.01
$S_{2}$	5	0.02	0.02
$S_{3}$	10	0.02	0.02

Table 5. Estimation errors of the range (m).

Sensors	IMM	STF	RNSTF	AKF	NAEKF
$S_{1}$	15.7454	17.5285	17.4727	96.1808	14.4525
$S_{2}$	4.8372	3.9745	3.9816	3.8560	3.8291
$S_{3}$	8.9003	8.1024	8.1208	7.604	7.3634
Fusion	4.7536	8.1988	8.4076	129.2355	3.5371

Table 6. Estimation errors of the azimuth angle (°).

Sensors	IMM	STF	RNSTF	AKF	NAEKF
$S_{1}$	0.0073	0.0098	0.0098	0.0116	0.0057
$S_{2}$	0.0110	0.0130	0.0133	0.0088	0.0090
$S_{3}$	0.0101	0.0135	0.0139	0.0099	0.0087
Fusion	0.0052	0.0076	0.0077	0.0108	0.0043

Table 7. Estimation errors of the elevation angle (°).

Sensors	IMM	STF	RNSTF	AKF	NAEKF
$S_{1}$	0.0076	0.0100	0.0098	0.4293	0.0060
$S_{2}$	0.0108	0.0140	0.0136	0.0097	0.0097
$S_{3}$	0.0102	0.0145	0.0139	0.0328	0.0094
Fusion	0.0051	0.0076	0.0074	0.1790	0.0043

Table 8. Computing time (s).

IMM	STF	RNSTF	AKF	NAEKF
1.9645	1.7343	1.8173	3.3055	1.8572

Table 9. Maneuver parameters of the target.

Time (s)	$Acceleration ({m / s}^{2})$			Time (s)	$Acceleration ({m / s}^{2})$
Time (s)	x	y	z	Time (s)	x	y	z
0–60	0	0	0	480–540	2.5	−5	2.5
60–120	−22.5	18	−22.5	540–600	−12.5	24	−12.5
120–180	2.5	−10	2.5	600–660	2.5	−14	2.5
180–240	−15	−15	−15	660–720	7.5	−10	7.5
240–300	4.5	−10	4.5	720–780	−9	10	−9
300–360	−20	29	−20	780–840	−5.5	11	−5.5
360–420	−10	−8	−10	840–900	4.5	−9	4.5
420–480	−12.5	−10	−12.5	900–1000	2.5	−12	2.5

Table 10. Estimation errors of the range (m).

Sensors	IMM	STF	RNSTF	NAEKF
$S_{1}$	18.2491	18.0910	18.2003	17.4784
$S_{2}$	5.0014	5.2004	5.2474	4.6219
$S_{3}$	9.4525	9.2029	9.3168	8.6291
Fusion	4.9532	11.8334	12.0310	4.8331

Table 11. Estimation errors of the azimuth angle (°).

Sensors	IMM	STF	RNSTF	NAEKF
$S_{1}$	0.0062	0.0098	0.0099	0.0064
$S_{2}$	0.0110	0.0137	0.0143	0.0104
$S_{3}$	0.0117	0.0133	0.0136	0.0096
Fusion	0.0048	0.0079	0.0080	0.0047

Table 12. Estimation errors of the elevation angle (°).

Sensors	IMM	STF	RNSTF	NAEKF
$S_{1}$	0.0062	0.0093	0.0092	0.0057
$S_{2}$	0.0102	0.0141	0.0137	0.0107
$S_{3}$	0.0112	0.0134	0.0131	0.0098
Fusion	0.0050	0.0076	0.0074	0.0045

Table 13. Computing time (s).

IMM	STF	RNSTF	NAEKF
1.9920	1.6958	1.8096	1.8723

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Tang, S.; Guo, J. Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets. Sensors 2022, 22, 5389. https://doi.org/10.3390/s22145389

AMA Style

Li J, Tang S, Guo J. Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets. Sensors. 2022; 22(14):5389. https://doi.org/10.3390/s22145389

Chicago/Turabian Style

Li, Jiali, Shengjing Tang, and Jie Guo. 2022. "Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets" Sensors 22, no. 14: 5389. https://doi.org/10.3390/s22145389

APA Style

Li, J., Tang, S., & Guo, J. (2022). Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets. Sensors, 22(14), 5389. https://doi.org/10.3390/s22145389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets

Abstract

1. Introduction

2. Problem Formulation

3. Maneuver Detection and the Framework

3.1. Maneuver Detection Based on D-S Evidence Theory

3.2. The Framework of the Noise-Adaption EKF

4. Noise-Adaption EKF for Maneuvering Targets

4.1. Process Noise Adaption Based on DDPG

4.2. Recursive Measurement Noise Estimation

4.3. Fusion Algorithm

5. Simulation

5.1. Target Model

5.2. Construction of Neural Networks

5.3. Simulation Results

5.3.1. Continuous Time Varying Maneuver

5.3.2. Abrupt Maneuver

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI