Noise-Adaption Extended Kalman Filter Based on Deep Deterministic Policy Gradient for Maneuvering Targets

Although there have been numerous studies on maneuvering target tracking, few studies have focused on the distinction between unknown maneuvers and inaccurate measurements, leading to low accuracy, poor robustness, or even divergence. To this end, a noise-adaption extended Kalman filter is proposed to track maneuvering targets with multiple synchronous sensors. This filter avoids the simultaneous adjustment of the process model and measurement model without distinction. Instead, the maneuver detection based on the Dempster-Shafer evidence theory is constructed to achieve the reliable distinction between unknown maneuvers and inaccurate measurements by fusing multi-sensor information, which effectively improves the robustness of the filter. Moreover, the adaptive estimation of the process noise covariance is modeled by a Markovian decision process with a proper reward function. Deep deterministic policy gradient is designed to obtain the optimal process noise covariance by taking the innovation as the state and the compensation factor as the action. Furthermore, the recursive estimation of the measurement noise covariance is applied to modify a priori measurement noise covariance of the corresponding sensor. Finally, the fusion algorithm is developed for the global estimation. Simulation experiments are carried out in two scenarios, and simulation results illustrate the feasibility and superiority of the proposed algorithm.


Introduction
With the development of high maneuvering targets, such as hypersonic aircrafts and missiles, maneuvering target tracking has recently drawn a lot of attention [1,2]. During the past few decades, lots of tracking methods have been developed and investigated for maneuvering targets [3][4][5]. Among the existing target tracking approaches, designing an appropriate model for the target maneuvering motion is regarded as an important part of decreasing the tracking errors caused by the model mismatch. Therefore, many maneuver models were proposed, such as constant acceleration (CA) [6], Singer model [7], current statistical model (CS) [8]. In contrast to single model methods, the interactive multiple model (IMM) algorithm [9,10] has been widely applied because of its adaptive capability for maneuvering target tracking problems. By mapping the motion mode to a model set, IMM performs the parallel filter for each model. Furthermore, based on the residual information and a priori information, the state estimation results of each model are weighted and synthesized to accurately achieve the maneuvering target tracking. Considering the limited performance of IMM based on the single platform, Wang et al. [11] proposed a multi-platform maneuvering target tracking algorithm based on IMM and the best linear unbiased estimate filter. Moreover, an adaptive interacting multiple model algorithm based on information-weighted consensus was proposed [12], which presented better adaptability and accuracy than the classical IMM. To cope with the miss detections and false alarms in a low-observable environment, a state-dependent IMM based on the knowledge, several adaptive Kalman filters (AKF) with noise sensing capability were established [34,35]. Unfortunately, the existing methods only can achieve the secondorder estimation accuracy [36,37], and the adaptive KF cannot distinguish between model mismatch and inaccurate a priori measurement noise. In addition, adaptive KFs use the innovation to adjust the process noise and measurement noise simultaneously, which leads to poor stability.
Inspired by the strong representation capability of neural networks [38,39], several filters based on reinforcement learning have been studied. Hu et al. [40] designed an attitude estimator by combining Lyapunov's method and the deep reinforcement learning algorithm. Tang et al. [41] combined the classic EKF with the deep reinforcement learning algorithm to realize the attitude estimation of the navigation system, which introduced a gain matrix of the residual and took it as the action to learn. Gao et al. [42] proposed an adaptive Kalman filter based on the deep deterministic policy gradient (DDPG) algorithm for ground vehicles, which took the integrated navigation system as the environment to obtain the process noise covariance matrix estimation. However, this adaptive filter takes the change in noise vector as the action to learn, resulting in poor filter stability. Additionally, the process noise covariance matrix of Inertial Navigation System (INS) error model will not experience a great change because it represents the inherent performance of the inertial sensor [42], which is different from the target tracking problem.
Motivated by the above investigation, a noise-adaption EKF is proposed in this paper to address the problem of maneuvering target tracking. Based on multiple synchronous sensors, the maneuver detection is constructed by utilizing Dempster-Shafer (D-S) evidence theory [43] to achieve the fused detection of multiple sensors. A Markovian decision process with a proper reward function is modeled for the adaptive estimation of the process noise covariance, and DDPG is designed to learn the compensation factor and feed it into EKF so that the improved filter can adaptively cope with the unknown maneuver. If the detection declares the occurrence of inaccurate measurement noise, the recursive measurement noise estimation is applied to the corresponding sensor. Finally, the local estimations are fused to obtain the target's global estimation. When the unknown maneuver and mismatched measurement noise emerge, the proposed filter can correct noise covariance adaptively. Distinct from the aforementioned algorithms, the proposed filter avoids the simultaneous adjustment of process model and measurement model without distinction, which effectively improves the robustness of the filter. Moreover, the application the deep deterministic policy gradient method in solving maneuver target tracking problems is explored in this paper.
The remainder of this paper is organized as follows: Section 2 introduces mathematical models and the formula of EKF. Section 3 provides the maneuver detection based on D-S evidence theory in detail, and the framework of the noise-adaption EKF is presented. In Section 4, the process noise adaption based on DDPG is designed. Moreover, the recursive measurement noise estimation and the fusion algorithm are completed. Finally, simulation results and conclusions are shown in Sections 5 and 6, respectively.

Problem Formulation
The process model of the target is given by where k denotes the index of discrete-time. x k represents the state vector, and f(·) is the state transition function. The process noise w k is assumed to be zero-mean Gaussian white noise with the covariance matrix Q k . The measurement model of sensors can be represented as where z k+1 is the measurement vector, and h(·) is the measurement mapping function. v k+1 is the measurement noise, whose covariance matrix is R k+1 . Since EKF has the advantages of simple algorithm and fast convergence, it has been widely applied in nonlinear systems [17]. Specifically, EKF can be divided into two stages. The first stage is the one-step prediction based on the process model wherex k|k is the state estimation and P k|k is the covariance matrix,x k+1|k and P k+1|k are the corresponding predictions. The state transition matrix F k+1|k is defined as the first-order Taylor expansion of the state transition function.
The second stage is the one-step update based on the measurement. The state vector and its covariance matrix are updated as where K k+1 is the gain matrix, and H k+1 is defined as As the main prior information of the system, the process noise Q and the measurement noise R are of great significance to the estimation performance and stability of the filter. Q and R represent the confidence degree in models and measurements, respectively. If Q and R are smaller than the true noise distribution, the uncertainty range of the true state is too narrow, resulting in estimation bias. Conversely, if Q and R are larger than the true noise distribution, it may lead to filter divergence. Additionally, inaccurate Q and R can impair the estimation accuracy.

Maneuver Detection Based on D-S Evidence Theory
The key to maneuver detection is to design an appropriate detection strategy to detect the target's maneuver precisely and timely. As stated in the introduction, the detection depending on the innovation from single-source information is not reliable. To improve the accuracy of the maneuver detection, the maneuver detection with a sliding-window structure based on D-S evidence theory is proposed in this section by introducing multiple synchronous sensors' information.
D-S evidence theory is a fusion rule established on a nonempty finite space, which includes a limited number of subsets [43]. Through independent observations by sensors, it can fuse observation results and give a joint judgment to improve the confidence and accuracy of events. D-S evidence theory can combine the evidence more intuitively and easily, and it has the expression ability of unknown and uncertain situations by calculating the probability of the set of multiple events.
Consider all the subsets of the finite space Θ as 2 Θ , including Θ and an empty set Φ, and define the map m: where the map m(·) is called the basic probability assignment function. m(A) is called the mass function of A, which indicates the degree of confidence for A according to the current observation.
For the problem of maneuvering target tracking, the target has two movement modes: maneuver or not, so there are two events in the maneuver detection problem: "Maneuver" and "Normal", denoted as A 1 , A 2 , respectively. However, when the degree of confidence tends to zero, unreasonable fusion results will be obtained, which is always called as Zadeh paradox [43]. To avoid the Zadeh paradox, one more event "uncertainty" is added, denoted as A 3 = {A 1 , A 2 }, and its confidence degree is set to 0.01.
Suppose there are N sensors applied in the target's measurement mission, denoted as S i (i = 1, 2, . . . , N), and the measurement of S i at time k is z i,k . The innovation is defined as the difference between actual measurement value and predictive measurement value.
Following this, the detection variable is constructed based on the innovation.
Since the innovation is subject to Gaussian distribution, the detection variable is supposed to be subject to χ 2 distribution with n degree of freedom, where n is the dimension of the innovation vector. Therefore, the maneuver probability is defined as It can be seen from Equation (16) that, the innovation is considered to be entirely caused by the unknown maneuver, hypothetically. The increase in the innovation indicates bigger maneuver probability.
Based on the definition of the maneuver probability, the mass function of S i is given as As mentioned above, the mismatch between the real measurement noise and a priori measurement noise can also lead to the innovation's increase. As a result, the maneuver probability determined by the innovation is unreliable. A joint detection mechanism is developed by fusing maneuver detection results of multiple synchronous sensors to improve the accuracy and reliability of the maneuver detection. For N independent pieces of evidence with N e events, the fused mass function of A 1 can be merged by D-S evidence theory as follows where N e = 3 in this paper. Similarly, the fused mass function of A 2 and A 3 can be acquired, denoted as m k+1 (A 2 ) and m k+1 (A 3 ), respectively. If m k+1 (A 1 ) ≥ m k+1 (A 2 ), it means that the target is detected to be maneuvering according to the fusion of multiple sensors at time k. Conversely, the target is not maneuvering when m k+1 (A 1 ) < m k+1 (A 2 ), and the mismatch of the a priori measurement noise is the main reason for the increased innovation.
However, the maneuver detection based on the current time is unstable. Furthermore, the sliding window structure is introduced into the maneuver detection to improve the stability of detection, where the detection depends not only on the current state but also on the neighboring states.
The size of the sliding window is set as N sw , and the final mass function m f k+1 (·) can be calculated by , the target is detected to be maneuvering, and the adaptive process noise estimation needs to be introduced to deal with unknown maneuvers.
, the target is not maneuvering, and the mismatch of measurement noise emerges, so the measurement noise adaption of S i is necessary. (c) There are no above two conditions when m , so EKF can cope with the target tracking task.

The Framework of the Noise-Adaption EKF
Based on the above maneuver detection, a noise-adaption EKF is proposed to adaptively cope with the unknown maneuver and inaccurate measurement noise. The framework of the proposed filter is shown in Figure 1. To achieve the accurate compensation for unknown maneuvers and the inaccurate measurement noise covariance, multi-sensor information undergoes the maneuver detection. Whether or not the process noise adaption or the measurement noise adaption proceeds is determined by the maneuver detection. If the target is detected to be maneuvering, the adaptive process noise is calculated by DDPG and introduced into the filter to cope with the model mismatch caused by the unknown maneuver. If the inaccurate measurement noise covariance is detected to be the main reason for the innovation's increase, the recursive measurement noise estimation is applied to the corresponding sensor. Sequentially, the local estimations are fused to obtain the global estimation of the target. When the unknown maneuver or mismatched measurement noise emerge, the proposed filter can estimate the maneuvering target's state with high performance.

Process Noise Adaption Based on DDPG
For the problem of maneuvering target tracking, the main task is to find a tracking strategy to estimate the target's station rapidly and accurately. However, the accurate mathematical models of non-cooperative targets cannot be obtained. When the target is maneuvering, the estimation error increases due to the model mismatch. To cope with the unknown maneuver, an effective method is to adaptively estimate the process noise covariance online. To this end, a process noise adaption method based on DDPG is proposed in this section. Through its self-learning and optimization capabilities, DDPG algorithm can adaptively cope with the unknown maneuver.
DDPG is known as one of the reinforcement learning algorithms for the continuous action strategy. The interactive learning process of DDPG is formalized as the Markov decision process (MDP), which can be defined by four elements: the state S = {s 1 , s 2 , · · ·}, the action A = {a 1 , a 2 , . . .}, the reward R = {r 1 , r 2 , · · ·}, and the state transition probability P(s k+1 |s k , a k ). The framework of DDPG is shown in Figure 2. The agent generates an action to interact with the environment at first, and then under the joint effects of action and environment, a new state is generated, and the environment gives a reward to the critic. Afterwards, the critic calculates the action-value function to evaluate the given policy, then the action policy is further optimized according to the evaluation results. The adaptive estimation of the process noise covariance is defined aŝ whereQ k is the adaptive process noise covariance, and its initial value is the a priori process noise covariance matrix. λ k > 1 is the compensation factor. This adaptive estimation form is more robust due to fewer parameters. Correspondingly, the prediction of the covariance matrix in Equation (4) is updated as Following by this, the problem of designing the maneuvering tracking strategy is transformed into the determination of the compensation factor, which can be described as an MDP. DPPG algorithm is designed to learn the compensation factor, so the action is defined as The innovation reflects the degree of model mismatch. The larger the innovation, the larger the model error, and the compensation factor needs to be increased accordingly. Therefore, the agent state is defined as the filter's innovation in Equation (12), which contains three-dimensional innovation: range, azimuth angle and elevation angle, i.e., When the environment, i.e., the filter, receives an action, a sufficient reward mechanism is required to evaluate the filter performance under the current action. Therefore, the filter residual is taken as the evaluation index, which is expressed as In practice, the residuals of range measurements are usually much bigger than the residuals of azimuth angle and elevation angle measurements in practice, which leads to different sensitivity to different residual components. To make the filter equally sensitive to residual elements, a weighting matrix W is introduced for normalizing the residuals.
where σ r , σ β , and σ ε represents the measurement errors' standard deviation of range, azimuth angle, and elevation angle, respectively. Therefore, the reward function is established as follows where tr[·] is the trace operation.
In DDPG algorithm, two neural networks including the actor network and the critic network, are utilized to approximate the action function and the action-value function. Denote A µ (s k ) as the action function, which is parameterized by µ. It maps from state to action, i.e., a k = A µ (s k ). Moreover, the action-value function is denoted as Q ω (s k , a k ), which is parameterized by ω. Additionally, to enhance the convergence of training, one target actor network and one target critic network are introduced into DDPG, denoted as A µ (s k ) and Q ω (s k , a k ), respectively.
The optimal action policy is realized by maximizing the expected total reward The expected total reward is defined as Therefore, the actor function is optimized toward the gradient of the expected total reward Similarly, the action-value function is optimized by minimizing the loss function The temporal difference error δ k is utilized to establish the loss function where r k is the reward, and ξ is the discounting factor. The actor-value function is optimized toward the gradient of the loss function To improve the stability in the DDPG training process, there is a replay memory buffer R to store the training samples. At every training step, training samples with a size N sample are utilized to train the actor and critic networks. The current transition experience (s k , a k , r k , s k+1 ) will be added into the buffer to replace the oldest one.
The actor network is updated by the Adam optimizer Similarly, the critic network is updated using the negative gradient of the loss function At the end of each sample training, the two target networks are soft updated as where τ is the updating rate. The implementation framework of the adaptive estimation of process noise based on DDPG is shown in Figure 3. The process noise adaption based on DDPG is summarized in Algorithm 1.

Algorithm 1
The process noise adaption based on DDPG 1 Initialize the parameters of the actor network and critic network 2 Initialize target networks by copying the actor and critic network 3 Initialize the replay memory buffer R For each episode, perform the following steps 4 Initialize the estimation state and its covariance matrix For each timestep, perform the following steps 5 Generate an action based on the actor network and the current state a k = A µ (s k ) + N k , where the random noise is generated by Ornstein-Uhlenbeck process 6 Execute the action, i.e., the compensation factor in the filter to obtain a new state s k+1 and a new reward r k 7 Store the sample (s k , a k , s k+1 , r k ) in the buffer R 8 Randomly select N Sample samples from the buffer 9 Calculate the temporal difference error δ p of each sample Calculate the policy gradient Update the actor network by Adam optimizer: Update the critic network Update the two target networks by soft update

Recursive Measurement Noise Estimation
When the sensor's maneuver detection result is inconsistent with the fused maneuver detection result, the corresponding sensor's measurement model is considered not to match a priori knowledge. In this case, the adaptive estimation method based on DDPG is no longer applicable to deal with it. Since the target is non-cooperative, the deviation between the process model and the actual model exists, so it is difficult to realize the optimal estimation of measurement noise in a similar way to the adaptive estimation of process noise. To modify the measurement noise adaptively, the recursive estimation of measurement noise is introduced [34].
The estimation ofR k+1 can be obtained by maximizing the a posteriori density function, which is given byR The residual can be represented as Substitute Equation (42) into Equation (41), yieldŝ The expectation of the measurement noise can be expressed as (the detailed derivation can be found in the appendix of [34].) The statistical residual covariance is calculated as the variance of historical residual sequence Accordingly, to approximate the real measurement covariance, the modified measurement covariance is updated as the following recursive form where ρ k+1 is the weighting factor, and b is a fading factor. The initial value ofR k is the a priori measurement noise covariance matrix. Therefore, the gain matrix in Equation (6) is replaced as By modifying the measurement covariance, the gain matrix is adjusted, then the impact of inaccurate a priori measurement noise is reduced.

Fusion Algorithm
The local estimations from multiple sensors are fused in this section. Since the crosscovariance matrixes of different sensors are unknown, the fault-tolerant generalized convex combination algorithm (FGCC) [44] based on information theory is introduced in this section.
For two local estimationsx 1 andx 2 , the fusion rule of FGCC iŝ where P 1 and P 2 are corresponding covariance matrices, υ 1 and υ 2 are the weighting parameters, and the δ is an adaptive parameter constructed as δ = H(P 1 )+H(P 2 ) H(P 1 )+H(P 2 )+I(P 1 ,P 2 ) (52) In the above equation, H is the Shannon entropy, and I is the symmetrized Kullback-Leibler distance between two distributions known as J-divergence.
where n is the number of state vector dimensions, and D is the Kullback-Leibler divergence.
where d x ij =x i −x j ,and i, j ∈ {1, 2}, i = j. The weights can be determined by the following equation Obviously, the above algorithm can only be applied to the fusion of two local estimations. In order to improve the applicability of the fusion algorithm, an extended fault-tolerant generalized convex combination algorithm (EFGCC) is developed, which provides the general fusion rule for N > 2.
Suppose there are N local estimationsx i (i = 1, . . . , N) and corresponding covariance matrices P i . The fusion form of EFGCC is given aŝ where the adaptive parameter δ is updated as The definitions of the Shannon entropy and the symmetrized Kullback-Leibler distance are the same as above. The weighting parameter υ i can be determined by the following equation where By the fusion rule of Equation (57), the global estimation for maneuvering targets is realized.

Simulation
In this section, the effectiveness and superiority of the proposed tracking method are demonstrated through numerical simulations.

Target Model
In the simulation experiments, the Singer model [7] is adopted as the process model in this paper, which describes the target acceleration as the colored noise rather than the white noise. The acceleration is assumed to be a first-order time correlation. The model can be given by The state vector x k = [x k , v x,k , a x,k , y k , v y,k , a y,k , z k , v z,k , a z,k ] T , which includes the position, velocity, and acceleration in three directions.
The state transition matrix is defined as where α is the target's maneuvering frequency, and T is the sampling period. The covariance matrix is calculated as where The measurement vector is defined as z k = [r k , β k , ε k ] T , where r k is the range, β k is the azimuth angle, and ε k is the elevation angle. The measurement mapping function is The initial covariance matrix of the measurement noise is

Construction of Neural Networks
In the proposed process noise estimation based on DDPG, the actor network is used to learn the action policy from the input of the innovation [γ k,r , γ k,β , γ k,ε ] T to the output of the compensation factor λ k , and the critic network is used to obtain the action-value function based on the innovation and the compensation factor. Inspired by [45,46], four-layer fully connected networks are commonly utilized in deep reinforcement learning applications. Hence the structure of the actor network is designed as shown in Figure 4, and the critic network is represented by the same network structure. Too few neurons will lead to underfitting. Conversely, too many neurons can lead to overfitting and increased training time. By referring to [42,45,46] and simulation experiments, the size of neural networks are set as in Table 1.  The activation function is tan h function, which is given by Simulation parameters are summarized in Table 2. Table 2. Simulation parameters of DDPG.

Parameters Value
The discounting factor ξ 0.99 The updating rate τ 0.001 The number of samples N sample 64 The number of episodes 500 The number of timesteps 1000 The simulation environment is defined as 6000 km × 100 km × 6000 km, in which the target tracking task is carried out. The training task includes 500 random episodes, and each episode runs 1000 timesteps. The maximum acceleration of the target is 30 m/s 2 , and the minimum acceleration is −30 m/s 2 . For each episode, the target randomly maneuvers with values uniformly distributed between the minimum and the maximum accelerations. According to the definition of the training environment and the maneuver space, training scenarios are constructed.

Simulation Results
To demonstrate the superiority of the proposed filter, IMM [9], STF [22], RNSTF [33], and AKF [34] are simulated for comparison. Two simulation scenarios are conducted in this section to illustrate the feasibility of the proposed noise-adaption EKF. In the first scenario, the target makes continuous time-varying maneuvers. In the second scenario, the target is set to make abrupt maneuvers to verify the performance against strong maneuvers.
In the simulation, the fading factor in the recursive measurement noise estimation and AKF b = 0. The root mean square error (RMSE) is generally used to estimate the tracking accuracy, given as wherex k is the estimation value, and x k is the real value. N MC is the number of Monte Carlo experiments, which is 100 in the following simulations.

Continuous Time Varying Maneuver
In this scenario, the target's acceleration is continuous time-varying in sinusoidal form, which can be described as where A is the amplitude, ω is the frequency, and ϕ 0 is the initial phase. The entire flight process lasts 1000 s. The initial position is [800 km, 80 km, 800 km], and the initial velocity is [−300 m/s, 10 m/s, −300 m/s]. The sinusoidal parameters are set as shown in Table 3. Assuming that there are three sensors at the origin to track the maneuvering target with a sampling period of 1 s, and a priori measurement errors are set as shown in Table 4. Furthermore, to verify the performance of the proposed noise-adaption EKF under the condition of inaccurate measurement noise, the magnitude of the range measurement noise of S 1 enlarged five times during 500-600 s, which is unknown to the filter. The algorithms mentioned above are utilized for tracking solution, and the same fusion method, i.e., EFGCC, is adopted to obtain fusion trajectories. For convenience, the proposed noise-adaption EKF is denoted as NAEKF in the following simulation results. The estimated trajectories are shown in Figure 5. The position estimation errors are presented in Figure 6. In order to correspond to the measurements, the position estimation errors are given in terms of range, azimuth angle and elevation angle. The position errors of NAEKF are smaller than that of other filters, especially in the stage of inaccurate measurement noise. Correspondingly, Figure 7 shows the σ-boundaries of position estimation errors. At the beginning of the simulation, position errors are generally large due to initialization errors, but the estimation accuracy gradually improves over time. After 100 s, the range error of the proposed filter converges to 3.27 m. The azimuth angle error reaches 0.0033 • , and the elevation angle error is stable around 0.0029 • . Due to the effect of inaccurate measurement noise between 500-600 s, the stable value of range errors reaches 3.54 m. Similarly, the azimuth angle error increases slightly and finally stabilizes at 0.0043 • , and the elevation angle error stabilizes at 0.0043 • . To compare local estimation accuracy of different algorithms, the detailed estimation errors are shown in Tables 5-7. Taking the range estimation of S 1 as an example, the estimation accuracy of NAEKF is 8.95%, 21.28% and 20.90% higher than that of IMM, STF and RNSTF, respectively, and the estimation error of AKF is almost seven times that of NAEKF. Obviously, the proposed filter has better tracking accuracy than other filters.     Figures 8 and 9 show the velocity estimation errors and the corresponding σ-Boundary in three directions, respectively. As shown in simulation results, the velocity estimation accuracy of IMM, STF, RNSTF and AKF are worse than that of NAEKF. It can be seen from Figure 9    The acceleration estimation errors and the corresponding σ-boundaries of acceleration estimation errors are presented in Figures 10 and 11, separately. The average acceleration estimation accuracy of NAEKF is 1.27 m/s 2 , whereas that of IMM, STF, RNSTF and AKF is around 1.86 m/s 2 , 4.20 m/s 2 , 4.11 m/s 2 and 28.01 m/s 2 , respectively. It can be confirmed that the proposed NAEKF can effectively deal with unknown sinusoidal maneuvers and achieves better tracking performance than IMM, STF, RNSTF as well as AKF. Table 8 provides computing times of five algorithms. Since NAEKF adds steps such as maneuver detection, the computing time is slightly longer than that of STF and RNSTF. However, NAEKF takes less computing time than IMM and AKF, because the parallel calculation of sub filters in IMM and the simultaneous noise estimation in AKF are time-consuming.   On the basis of the above experiment settings, the magnitude of the angle measurement noises of S 1 also enlarged five times during 500-600 s. Simulation results show that the matrix singularity problem of AKF occurs in this simulation experiment. As described in the introduction, AKF shows poor stability, because it updates the process noise and measurement noise simultaneously without distinction. Therefore, IMM, STF, RNSTF and NAEKF are presented in this experiment. Figures 12 and 13 show the position estimation errors and the corresponding σ-Boundary. Although the inaccurate measurement noises lead to certain increases in estimation errors, NAEKF shows better estimation accuracy and robustness compared to other three algorithms. Taking the azimuth angle as an example, the estimation error of NAEKF is 0.0068 • , while that of IMM reaches 0.0098 • . Additionally, the estimation error of STF and RNSTF is 0.0159 • and 0.0158 • , respectively, which is more than twice the estimation error of NAEKF.  The estimation results of the velocity and acceleration are shown in Figures 14-17. It is obvious that the proposed NAEKF can effectively deal with inaccurate measurement noises in this experiment and achieves the best estimation accuracy compared to other algorithms. The velocity estimation errors of IMM, STF and RNSTF are several times that of NAEKF, and the same conclusion can be drawn from the acceleration estimation results.

Abrupt Maneuver
In this scenario, the target's acceleration is set to be abrupt, and the maneuver parameters are shown in Table 9. The initial position is [6000 km, 15 km, 6000 km], and the initial velocity is [−1000 m/s, 0 m/s, −1000 m/s]. The measurement errors are the same as in Table 4, and the magnitude of the range measurement noise of S 1 enlarged five times during 700-800 s. It should be noted that the matrix singularity problem of AKF occurs in this simulation scenario. Therefore, IMM, STF, RNSTF and NAEKF are presented in this section. The tracking results are shown in Figure 18.
The position estimation errors and the corresponding σ-boundaries are shown in Figures 19 and 20, respectively. Simulation results show that the proposed filter can track the abrupt maneuver target effectively. Owing to the D-S maneuver detection, the proposed noise-adaption EKF can detect unknown maneuvers and inaccurate measurement noise simultaneously. As a result, the proposed filter can utilize the process noise adaption based on DDPG to adaptively cope with the unknown maneuver. It can be seen from Figure 19a that abrupt maneuvers cause several error increases in the tracking process, but they are quickly eliminated. When inaccurate measurement noise is detected, the proposed filter utilizes the recursive measurement noise adaption to weaken its influence. Certain increases in estimation errors are caused by inaccurate measurement noise during 700-800 s. The estimation errors of STF and RNSTF increase remarkably, especially when the inaccurate measurement noise appears. The estimation errors of different algorithms are shown in Tables 10-12. It is evident that the proposed filter achieves higher estimation accuracy than IMM, STF and RNSTF in both local estimation and fusion estimation.    However, the estimation error in y direction reaches 146.70 m/s, because the maneuver in y direction is much bigger than other directions. Estimation errors of STF and RNSTF are significantly bigger than that of NAEKF. In contrast, IMM achieves better estimation accuracy than STF and RNSTF, but still worse than NAEKF. According to acceleration estimation results in Figure 23, there are several peaks of acceleration tracking errors during abrupt maneuvers, but they decrease rapidly. The σ-boundaries of acceleration estimation errors are shown in Figure 24. The acceleration estimation accuracy in x direction of NAEKF is 4.97 m/s 2 , whereas that of IMM, STF and RNSTF is finally stabilized at 10.23 m/s 2 , 12.31 m/s 2 and 12.49 m/s 2 , respectively. Similarly, the acceleration estimation accuracy in z direction of NAEKF is 5.06 m/s 2 , and the estimation accuracy of IMM, STF and RNSTF is around 10.38 m/s 2 , 12.02 m/s 2 and 12.24 m/s 2 , respectively. Due to the greater maneuver, the acceleration estimation accuracy in y direction is worse than that in other directions. Specifically, the acceleration estimation error of NAEKF reaches 10.27 m/s 2 . In contrast, the acceleration estimation error of IMM, STF and RNSTF is around 14.66 m/s 2 , 18.93 m/s 2 , and 18.32 m/s 2 , respectively. Table 13 provides computing times of four algorithms. Similarly to the first scenario, the computing time of NAEKF is shorter than that of IMM, but slightly longer than that of STF and RNSTF.

Conclusions
To address the problem of maneuvering target tracking with inaccurate measurements, the noise-adaption EKF based on DDPG is proposed in this paper. The proposed filter avoids the simultaneous adjustment of the process model and the measurement model without distinction, which effectively improves the robustness of the filter. Within the framework of the noise-adaption EKF, the maneuver detection based on D-S evidence theory is constructed to distinguish between the unknown maneuver and inaccurate measurement noise simultaneously by fusing multi-sensor information. Moreover, a Markovian decision process of maneuver tracking is established to cope with unknown maneuvers. DDPG is developed to learn the adaptive estimation of the compensation factor and feed it to EKF. In addition, the recursive measurement noise estimation is applied to estimate a priori measurement noise covariance online. The local estimations are fused at last, achieving the global estimation of multiple sensors. The simulation results indicate that the proposed noise-adaption EKF is effective in both scenarios of continuous time-varying maneuver and the abrupt maneuver. As shown in simulation results, the proposed tracking method has a better tracking performance compared to IMM, STF, RNSTF and AKF.