Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors

: The paper considers the problem of tracking a moving target using a pair of cooperative bearing-only mobile sensors. Sensor trajectory optimisation plays the central role in this problem, with the objective to minimize the estimation error of the target state. Two approximate closed-form statistical reward functions, referred to as the Expected Rényi information divergence (RID) and the Determinant of the Fisher Information Matrix (FIM), are analysed and discussed in the paper. Being available analytically, the two reward functions are fast to compute and therefore potentially useful for longer horizon sensor trajectory planning. The paper demonstrates, both numerically and from the information geometric viewpoint, that the Determinant of the FIM is a superior reward function. The problem with the Expected RID is that the approximation involved in its derivation signiﬁcantly reduces the correlation between the target state estimates at two sensors, and consequently results in poorer performance.


Introduction
Multiple passive sensors can be cooperatively used in a target tracking system to achieve target observability. In general, the performance of such a tracking system is correlated with the states of these sensors [1]. For example, a 2D moving target can be observed by two cooperative bearing-only sensors, while they exhibit observability issues individually [2]. In this case, the track accuracy is highly dependent on the locations of the sensors when taking measurements. A crucial step in the tracking is to plan the moving sensor trajectories ahead of measurement process in a way which will minimise tracking error at a future time when the measurements taken by the two sensors on the planned trajectories are used [3,4].
The sensor trajectory planning problem is also known as trajectory scheduling or optimisation in the literature. It can be cast as a partially observed Markov decision process (POMDP) [5,6], where the decision process is carried out by minimising the cost or maximising the reward against a measure criterion that is related to the Fisher information [7][8][9] or mutual information [10,11]. Ghassemi and Krishnamurthy describe a method in [12], where they use a set of orthogonal basis functions, i.e., the Chebychev polynomial series as suggested in [13] for the control space parameterisation to replace the brute force search for N-step planning ahead. Logothetis et al. proposed an information theoretic approach to sensor scheduling in [10]. The sensor is steered so that the mutual information between the posterior density of the target and the measurement sequence at a desired future time is maximized. Practical algorithms for sensor scheduling were discussed in [11]. Morelande et al. proposed an alternative approach in [14], which calculates the accumulated reward, defined by the expectation of information divergence between the posterior and prior densities of target, for each possible sensor path and the sensor maneuvers are controlled by the action that maximises the accumulated reward. In general, computing reward functions require the evaluation of future measurements. Of course, it is impossible to give an exact computation of the future reward function since future measurements are inaccessible. One may mitigate this problem by using (conditionally) expected values of the reward at future times, or by using Monte Carlo sampling to numerically evaluate future measurements [15,16].
In this paper, we investigate the problem of tracking a moving target via cooperative passive sensors, where the tracking error is a function of sensor trajectories. The focus is on two statistical reward functions for passive sensing, both available in the analytic form. They are referred to as the Expected Rényi information divergence (RID) and the determinant of the Fisher information matrix (FIM). Being available analytically, they are potentially useful for a longer horizon sensor trajectory planning, even on the platforms with limited computational resources. The paper investigates the implications of the approximations involved in derivation of the two reward functions. The analytic expression for the Expected RID is derived assuming a linear-Gaussian case. We show that this approximation significantly reduces the cross-correlation of the tracking error between the two sensor locations, and consequently results in the performance similar to that of the trace of the FIM. On the other hand, the determinant of FIM, where the ground truth is approximated by the prediction of the target state, taking correctly into account the crosscorrelation between the two sensors. We also discuss this problem from the information geometric viewpoint by illustrating the geometric properties of the FIM corresponding to the statistical reward function. Finally, we compare the performance of the two reward functions by a numerical example in which a (non-cooperative) target is being chased by two cooperative bearing-only sensors. Preliminary work on this subject was reported in [17].
The paper is organised as follows: the sensor trajectory scheduling problem of interest is described in Section 2. The analytic expressions for the two reward functions, i.e., the Expected RID and the determinant of FIM are derived and their performance discussed in Section 3. In Section 4, we further compare the performance difference between the two reward functions from the information geometric viewpoint. Simulation results are presented in Section 5. Finally, the concluding remarks are given in Section 7.

Problem Definition
Let us consider the problem of two-dimensional target tracking using measurements from two cooperative bearing-only sensors as shown in Figure 1. The target state, which consists of both target location (x k , y k ) and velocity (ẋ k ,ẏ k ), The system and measurement models are where the system transition matrix is and w k and v k are system and measurement noises, respectively. These are assumed to be independent Gaussian distributions, that is, where Q k is assumed to be known and . σ 1 and σ 2 are the standard deviations of measurement noises for sensors 1 and sensor 2, respectively.
The measurement function in (2) is expressed as where (x S 1 k , y S 1 k ) and (x S 2 k , y S 2 k ) are the locations of sensor 1 and sensor 2 at t k , respectively. The objective of target tracking is to estimate the target posterior probability density function (PDF) p(x k |β 1:k ) based on the bearings-only measurement sequence β 1:k up to current time t k and the prior PDF p(x k−1 |β 1:k−1 ). Clearly, the measurement model (4) is sensor state dependent, and so tracking accuracy depends on where the two sensors take measurements (i.e., on their motion). Our goal is to find the optimal future motion for the two sensors within the limitations imposed by the sensor platform dynamics, so as to minimize the future expected track error. This is a difficult problem because of the many sources of uncertainty involved in decision-making: the uncertainty in the current target state estimate as well as the uncertainty in the future target motion and measurements.
A practical way to implement the sensor trajectory optimisation procedure is to assume that sensor platforms move at constant speeds, v 1 and v 2 , respectively, and that they have a small set U of admissible course corrections at time t k . Each element u k of U is a pair u k = [u 1,k , u 2,k ] , where u i,k represents the course correction of the ith sensor at k. This results in the velocities of the two sensors in the local coordinates in the time interval (t k , t k+1 ] asẋ The assumption of constant speeds between sampling intervals is a standard practice for system analysis of discrete data sampling without loss of optimality. The analysis tends to be optimal at a high sampling rate. In summary, the underlying target tracking is a POMDP and involves an N-step ahead process of sensor trajectory planning, which computes the optimal action sequence u * k+1:N = {u * k+1 , · · · , u * k+N } such that the tracking error is minimised at the future time t k+N . A generic recursive process of sensor trajectory scheduling and target state estimation at discrete-time k ≥ 1 involves the steps listed in Algorithm 1: Algorithm 1 Sensor scheduling steps at discrete-time k 1: Inputs: The posterior at k, p(x k |β 1:k ); the sensor locations (x s i k , y s i k ) and course corrections u i,k , i = 1, 2.
2: Propose the set of admissible sensor actions U k+1:N 3: Perform scheduling based on an optimality criterion and obtain the optimal sequence of future sensor actions u * k+1:N ∈ U k+1:N for a future time period (t k+1 , t N ).

Reward Functions
The sensor trajectory scheduling is part of the underlying target tracking process, involving the computation of a sequence of future actions (sensor course corrections) based on the current target and sensor states. The key role in this process plays the criterion for measuring the reward of the future action. The reward functions used in the literature are based on the information theoretic measures. One criterion is the mutual information between the posterior probability density of the estimated target state and the measurement sequence, which we seek to maximize. A theoretic description is given in [18] and sub-optimal strategies are discussed in [19]. An alternative criterion is to maximize the information divergence between the prior and posterior probability densities of estimated target state [14]. As the ultimate goal of sensor trajectory optimisation is to improve target tracking accuracy, the posterior Cramer-Rao low bound [20] is also used as a cost function to take the estimator performance into account as well. In addition, we can also maximize the determinant or trace of the FIM [13]. All of these criteria are consistent in the sense that they maximize the Fisher information of the underlying system.
In this work, we investigate two reward functions for the problem at hand, which can be derived analytically: the Expected RID and the Determinant of the FIM. For the purpose of comparison, we also introduce the reward function based on the Trace of the FIM. The goal is to find a reward function which is both computationally efficient and reflects correctly the information change associated with various sensor trajectory hypotheses.

Expected RID
The Rényi information divergence [21] is the information divergence for probability densities f 1 (x) and f 2 (x), for α ≥ 0, For the underlying system with one-step ahead sensor scheduling, f 1 = p(x k+1 |β 1:k+1 ) signifies the posterior density at k + 1 and f 2 = p(x k+1 |β 1:k ) signifies the predicted density at k + 1, where β k+1 is the future measurement at time k + 1 given by (2). Thus, (6) becomes The evaluation of the integral (7) requires the future measurement β k+1 , which may be obtained numerically via Monte Carlo simulations [15,16]. A statistical solution in a closed-form can be derived for the linear Gaussian case [14], where the dependence on the future measurement β k+1 is removed by taking the expectation w.r.t. p(β k+1 |β 1:k ).
The linear Gaussian case is described by the transition and measurement y k |x k ∼ N (y k ; Hx k , R).
The predicted measurement distribution is then given by where S k+1 = HP k|k−1 H + R.
Assume that the sequence of measurements y 1:k has been collected so that the posterior density is given by where P k|k is the updated covariance at k. For the properties of the Kalman filter, the divergence between the prior PDF and posterior PDF at k + 1 is given by Taking the expectation of (12) with respect to p(y k+1 |y 1:k ) and using the formula we obtain the closed-form Expected RID function [22]: where Ψ k+1 = R −1 HP k+1|k H and P k+1|k is the covariance matrix associated with the predicted state estimatex k+1|k . For the bearings-only problem described in Section 2, see Equations (1) and (2), the Expected RID can only be approximated via linearisation, i.e., H in (14) is replaced by the Jacobian of (4) Note that P k+1|k is obtained from the underlying tracker [23] and R is replaced by Σ k associated with (2).

Determinant of Fisher Information Matrix
In general, it is non-trivial to derive the FIM or its determinant in a closed-form. For this particular case, the measurement model (4) is independent of target velocity and the resulting FIM is only of rank 2. In consequence, we only need to consider the target position components in derivation of the FIM, and therefore the target state vector x k in derivation will denote just the target position components ([x k , y k ] ). A closed form expression for the determinant of the FIM is analytically tractable as follows.
Suppose that the posterior at k − 1 is p(x k−1 |β 1:k−1 ). Using the current sensor locations and a proposed action u k , one can compute the one-step ahead sensor locations (x s i k , y where ln p(β|x) is the log-likelihood, which is obtained from Equation (2), and G k is a 2 × 2 matrix with entries Under the Gaussian measurement noise assumption, we have Therefore, The determinant of G k is a function of σ 1 , σ 2 , x k , x s 1 k and x s 2 k . From the definition and using (19)-(21), we have Calculation of the FIM requires knowledge of the target location at future time k, which in practice is approximated using the predicted target statex k based on the point estimate from p(x k−1 |β 1:k−1 ) and the system dynamics (1).

Trace of Fisher Information Matrix
In view of (19)- (21), the trace of the FIM is given by The trace of the FIM is a widely used objective function for sensor information maximisation [24], but it is more conservative than the determinant of the FIM since it does not take into account the off-diagonal terms of FIM. It will be used to explain the behaviour of the Expected RID function.

Remarks on the Reward Functions
Both the Expected RID and the Determinant of FIM involve approximations: the former using linearisation, the latter using the predicted target state in place of the true state. The following analysis highlights the difference between the two reward functions after approximation.
To compare the performance difference of the reward functions, we fix the location of Sensor 1 at S 1 = (50, 300), and plot the reward of one step ahead versus the location of Sensor 2. In the simulation, P is set to be a constant, σ 1 = 0.175 (rad) and σ 2 = 0.2618 (rad). The contour plots for the Expected RID and Determinant of FIM are in Figure 2a,b, respectively. In view of these figures, we can make the following observations: • While Figure 2a shows an increasing reward as Sensor 2 approaches the target, there is no information change observed when Sensor 2 moves around the target in a circle. This indicates that the Expected RID changes only with respect to the distance from individual sensors to the target rather than with respect to the relative locations of the two sensors. This is in contrast to the value of the Determinant of FIM shown in Figure 2b. • Figure 2b shows a straight line across the two sensor locations, where the values of FIM are singular. This indicates a subspace corresponding to the singular points on which the target state is unobservable. On the other hand, neither the expected RID nor the trace of the FIM take this singular subspace into account. • We observed that the contour plot of the Expected RID in Figure 2a is virtually the same as that of the Trace of FIM shown in Figure 2c. The latter is independent of the two sensor locations as long as their distances to the target remain unchanged and this is verified by (23). This suggests that the Expected RID does not preserve the observed information structure with respect to the two sensor locations after linearisation.
Note that we fix the location of sensor 1 in Figure 2 for a clear visualization of the difference in comparison, given that both sensors are movable in the sensor trajectory optimisation.

Geometric Interpretation on Reward Function
The above argument on the selection of a reward function in cooperative sensing can be described using information geometry. It is well known that the FIM can be used as a metric tensor defining a Riemannian manifold, called the statistical manifold [25]. Riemannian geometrical concepts, thereby, can inform in problems such as ours. For instance, the Fisher Information distance, defined in terms of the Riemannian metric, can be used to compare parameterised distributions.
For the problem at hand, the family of probability distributions S = {p(β|x)}, parameterised in the target location space x ∈ R 2 , forms a 2D statistical manifold where x plays the role of a coordinate system of S. The Fisher information distance (FID) between p(β|x(t 1 )) and p(β|x(t 2 )) is defined as the integral along the curve x(t): where the minimizing curve is a geodesic in the Riemannian manifold. In local coordinates, the geodesic equations are given by the Euler-Lagrange equations as [25] For convenience, we use the subscript i of x to represent its ith component and thus x i , i = 1, 2 are the coordinates of the curve x(t), Γ l ij , i, j = 1, 2 are the Christoffel symbols of the second kind, and we can use the Levi-Civita connection coefficients, where [g lk ] signifies the inverse of G = [g lk ] and Einstein notation for summation is used. The geodesic equations in (25) are ordinary differential equations for the coordinates x i , i = 1, 2. A unique solution x(t) can be found for given initial conditions x(0) andẋ, which is analogous to an initial position x(0) and the "speed" ν ∈ T x S in the sense of the classical mechanics, where T x denotes the tangent vector of S at x.
Assume that a geodesic is projected onto the parameter space R 2 with a starting point x(0) and a tangent vector ν. The exponential map of the starting point is then defined as [26] exp ν [x(0)] = Ψ 1; x(0), ν . (27) where the notation Ψ t; x(0), ν is used to signify a geodesic with a starting point x(0), a tangent vector ν and end point x(t).
It can be shown that the length along the geodesic between x(0) and Ψ 1; x(0), ν is |ν| [26,27]. Thus, for a fixed ν, the plot of geodesics along all directions at x produces an FID circle centered at x in S. Intuitively, the projection of an equal FID circle from statistical manifold to the parameter space, which is generally not isotropic in magnitude, reflects the observability of the underlying sensors at given states. With the FIM corresponding to a sensible reward function, the projection will vary as the state of a sensor changes.
To highlight the difference between the two statistical reward functions in question, we plot the projection of FID circles in parameter space under the scenario where two fixed targets are observed by the two bearing-only sensors. These circles are centred at a target state with the same radius ν in the statistical manifold spanned by the FIM corresponding to the two statistical reward functions, respectively.
We examine the changes of FID circles before and after the Sensor 2 move circularly around one target.
The FID circles, shown in Figure 3a,b, have changed significantly as Sensor 2 moves circularly around the target on the left. This indicates that the amount of target information which may be observed by the two sensors varies with sensor states. On the other hand, if we plot the FID circles in the statistical manifold spanned by the FIM without offdiagonal elements which correspond to using the trace of FIM as a statistical reward function, as shown in Figure 4a,b, the FID circle centered at the target on the left does not change as the Sensor 2 moves circularly around the target.
We emphasize that sensor 2 moves close to the target on the right while keeping the same distance to the target on the left. The FID circles on both left and right targets change in Figure 3 under the Determinant of FIM, but the FID circle on the left remains the same as shown in Figure 4 after the movement of sensor 2 when the Trace of FIM criterion is used. This result indicates that the Fisher information will not be fully explored if trace rather than determinant of the FIM is used as a statistical reward function. The Expected RID performs similarly to the trace of FIM.

Simulation Example
In this section, we illustrate the effectiveness of sensor scheduling using an example of tracking a maneuvering target by two bearing-only sensors. Target motion follows a hidden Markov process which contains the states of a left turn (−5 • ), a right turn (5 • ) and straightline (0 • ), and it is moving at a constant speed of 10 m/s. The initial target state is x 0 = [600, 560, 10 * cos(30 • ), 10 * sin(30 • )] . In the first 50 of a total of 100 scans, the target is in the left turn state and it shifts to a right turn state for the rest of scans. Initially, the two bearing-only sensors are moving at a constant speed of 10m/s from the locations S 1 = (1900, 560) m and S 2 = (1950, 560) m heading in the directions 0 • and 60 • , respectively. They acquire bearing measurements of the target at a sampling rate of T = 5 s. During each scan, they will be steered to one of five directions with respect to their current headings according to the trajectory optimisation decision. The 5 directions are −90 • , −45 • , 0 • , 45 • and 90 • . Therefore, for a N-Step ahead decision process, the number of action hypotheses yielded for the two cooperative sensors are (5 2 ) N . For example, the number of action hypotheses for N = 1 is 5 2 = 25 and for N = 3 is (5 2 ) 3 = 15, 625. We illustrate this example in Figure 5.
In practice, the maximum number of hypothesis histories allowed to steer future measurement at each epoch is constrained by a fixed number "MaxH" to provide a feasible computational overhead for real-time operation. In our simulation, we set MaxH = 1000 and such a choice does not affect the comparison of the optimisation performance under different reward functions. We observed from simulation that under this constraint the track error difference between N > 3 and N = 3 is statistically close to zero and thus is negligible. Therefore, in the performance comparison versus Monte Carlo runs, we only consider N ≤ 3. An EKF tracker [28] is implemented to estimate the posterior density of target state from the target bearing measurements taken by the two sensors. We assume that the target bearing measurements are corrupted with a Gaussian noise of zero-mean with standard deviation of two degrees. Figures 6 and 7 show the typical trajectories of the two bearing-only sensors in the target tracking experiment under the Determinant of FIM (DetFIM), the Expected RID (ExpReward), and the Trace of FIM (TrFIM), respectively. By maximising DetFIM, the two sensors move in a way such that the angle formed by sensor 1, target, sensor 2 is approximately π/2 while approaching the target. Thus, the measurements taken by the two sensors along these computed trajectories minimise the error covariance of the underlying tracker ( Figure 6). On the other hand, the sensor trajectories scheduled by maximising ExpReward (or TrFIM) show no cooperative movement between the sensors while they are approaching the target. This is because none of correlations between the two sensors with respect to the target are used in the evaluation of either ExpReward or TrFIM criterion as discussed in Section 3. By maximizing either of these two reward functions, the two passive sensors move independently (at identical speeds) in the way to minimize their own distances-to-target (Figure 7), which clearly yields larger errors than that of the DetFIM result shown in Figure 6. The statistical results averaged over 100 Monte Carlo runs for each case are shown in Figure 7, where the root-mean-squared position error comparison of the tracker under different reward functions for N = 2 is presented in Figure 8a, and the RMS errors under DetFIM for N = 1, 2, 3, 4 are in Figure 8b. In summary, the simulation results demonstrate that • the tracker yields a smaller error but significantly more computational overhead as N increases. We observed that the computational complexity for N = 1, 2, 3, 4 is roughly at a ratio of 1 : 41 : 2890 : 2905 when MaxH = 1000. Both computational overhead and RMS errors are bounded by MaxH, which is the maximum number of sensor trajectory hypotheses to be maintained. As shown in Figure 8b, the RMS error performances for N ≥ 3 are almost identical. This reflects the trade-off between error performance and computational load as the limited MaxH leads information loss due to the hypotheses pruning process: • the sensor scheduling under DetFIM yields a significantly small error, which is consistent with our analysis. • the RMS error under ExpReward is quite similar to that under TrFIM. The latter completely ignores the correlation between the two sensor states. • The major computational complexity of the simulation comes from the computation of N-Step ahead decision process when N > 2. For example, the average computational overhead per scan are 0.7 s when N = 2 and 5.5 s for N = 3 (on a machine with an Intel 2 Core i7-4600U CPU 2.10GHz processor (Intel: Mountain View, CA, USA)). The computational complexity difference between DetFIM, TrFIM and ExpReward are 1 : 1.02 : 1.45.

Further Discussion
From the analysis in Sections 3 and 4 and simulation results in Section 5, we may conclude the following points, which highlight the outcome from the major contribution of this paper.

•
The measurement Equation (4) suggests that the optimisation of sensor trajectories is required to reduce state estimation error (or maximize FIM) of the underlying system. • The three closed-form reward functions derived by approximation are applicable for the optimisation aiming to maximize Fisher information. Identical performances may be achievable when (4) describes a single sensor case (S 1 = S 2 ). • Only the determinant of FIM should be used for the case where cooperative sensing from multiple sensors is involved. • Since the target state in the FIM calculation can only be approximated by the estimate from tracker, the reward calculated by DetFIM can be senseless if the track error is significantly large. • In the N-Step ahead sensor trajectory optimisation process described in this paper, computational load can become intractable if a large number of sensors are jointly considered or a large N is chosen. Researchers are investigating other solutions, such as the Rapid Random Exploring Tree numerical sampling method [29], in order to address the computation issue.

Conclusions
In this paper, we investigated the problem of sensor trajectory optimisation for tracking a moving target using multiple cooperative passive sensors. The problem is formulated as POMDP, and a N-step ahead sensor trajectory scheduling is considered to steer the measurement decision process that maximises a reward function. Two statistical reward functions, the Expected RID and the determinant of FIM, both of which are derived in closed-forms, are considered. We show that the Expected RID function approximated through linearisation effectively yields a much weaker correlation between the two sensor states and thus the trajectory scheduling performance using it is as poor as using the Trace of FIM. On the other hand, the Determinant of FIM is an appropriate statistical reward function to be used. This conclusion is confirmed by the simulation results presented. Although this work is based on tracking a moving target by two bearing-only sensors scenario, the results can also be applied to other types of sensors which are required to be used jointly to observe the underlying target state, such as Doppler radars. The underlying application areas may consist of those whose decision-making performances depend on the cooperation of multiple dependent sensors such as smart cites, energy and robotics [30][31][32] and practice of game theory [33].