Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors

Wang, Xuezhi; Ristic, Branko; Himed, Braham; Moran, Bill

doi:10.3390/signals2020014

Open AccessArticle

Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors

¹

School of Engineering, RMIT University, Melbourne, VIC 3000, Australia

²

Air Force Research Laboratory, Dayton, OH 45433, USA

³

Melbourne School of Engineering, University of Melbourne, Parkville, VIC 3010, Australia

^*

Author to whom correspondence should be addressed.

Signals 2021, 2(2), 174-188; https://doi.org/10.3390/signals2020014

Submission received: 9 November 2020 / Revised: 2 April 2021 / Accepted: 2 April 2021 / Published: 7 April 2021

Download

Browse Figures

Versions Notes

Abstract

:

The paper considers the problem of tracking a moving target using a pair of cooperative bearing-only mobile sensors. Sensor trajectory optimisation plays the central role in this problem, with the objective to minimize the estimation error of the target state. Two approximate closed-form statistical reward functions, referred to as the Expected Rényi information divergence (RID) and the Determinant of the Fisher Information Matrix (FIM), are analysed and discussed in the paper. Being available analytically, the two reward functions are fast to compute and therefore potentially useful for longer horizon sensor trajectory planning. The paper demonstrates, both numerically and from the information geometric viewpoint, that the Determinant of the FIM is a superior reward function. The problem with the Expected RID is that the approximation involved in its derivation significantly reduces the correlation between the target state estimates at two sensors, and consequently results in poorer performance.

Keywords:

cooperative tracking; trajectory optimisation; passive mobile sensors; bearings-only tracking; Fisher information matrix

1. Introduction

Multiple passive sensors can be cooperatively used in a target tracking system to achieve target observability. In general, the performance of such a tracking system is correlated with the states of these sensors [1]. For example, a 2D moving target can be observed by two cooperative bearing-only sensors, while they exhibit observability issues individually [2]. In this case, the track accuracy is highly dependent on the locations of the sensors when taking measurements. A crucial step in the tracking is to plan the moving sensor trajectories ahead of measurement process in a way which will minimise tracking error at a future time when the measurements taken by the two sensors on the planned trajectories are used [3,4].

The sensor trajectory planning problem is also known as trajectory scheduling or optimisation in the literature. It can be cast as a partially observed Markov decision process (POMDP) [5,6], where the decision process is carried out by minimising the cost or maximising the reward against a measure criterion that is related to the Fisher information [7,8,9] or mutual information [10,11]. Ghassemi and Krishnamurthy describe a method in [12], where they use a set of orthogonal basis functions, i.e., the Chebychev polynomial series as suggested in [13] for the control space parameterisation to replace the brute force search for N-step planning ahead. Logothetis et al. proposed an information theoretic approach to sensor scheduling in [10]. The sensor is steered so that the mutual information between the posterior density of the target and the measurement sequence at a desired future time is maximized. Practical algorithms for sensor scheduling were discussed in [11]. Morelande et al. proposed an alternative approach in [14], which calculates the accumulated reward, defined by the expectation of information divergence between the posterior and prior densities of target, for each possible sensor path and the sensor maneuvers are controlled by the action that maximises the accumulated reward. In general, computing reward functions require the evaluation of future measurements. Of course, it is impossible to give an exact computation of the future reward function since future measurements are inaccessible. One may mitigate this problem by using (conditionally) expected values of the reward at future times, or by using Monte Carlo sampling to numerically evaluate future measurements [15,16].

In this paper, we investigate the problem of tracking a moving target via cooperative passive sensors, where the tracking error is a function of sensor trajectories. The focus is on two statistical reward functions for passive sensing, both available in the analytic form. They are referred to as the Expected Rényi information divergence (RID) and the determinant of the Fisher information matrix (FIM). Being available analytically, they are potentially useful for a longer horizon sensor trajectory planning, even on the platforms with limited computational resources. The paper investigates the implications of the approximations involved in derivation of the two reward functions. The analytic expression for the Expected RID is derived assuming a linear-Gaussian case. We show that this approximation significantly reduces the cross-correlation of the tracking error between the two sensor locations, and consequently results in the performance similar to that of the trace of the FIM. On the other hand, the determinant of FIM, where the ground truth is approximated by the prediction of the target state, taking correctly into account the cross-correlation between the two sensors. We also discuss this problem from the information geometric viewpoint by illustrating the geometric properties of the FIM corresponding to the statistical reward function. Finally, we compare the performance of the two reward functions by a numerical example in which a (non-cooperative) target is being chased by two cooperative bearing-only sensors. Preliminary work on this subject was reported in [17].

The paper is organised as follows: the sensor trajectory scheduling problem of interest is described in Section 2. The analytic expressions for the two reward functions, i.e., the Expected RID and the determinant of FIM are derived and their performance discussed in Section 3. In Section 4, we further compare the performance difference between the two reward functions from the information geometric viewpoint. Simulation results are presented in Section 5. Finally, the concluding remarks are given in Section 7.

2. Problem Definition

Let us consider the problem of two-dimensional target tracking using measurements from two cooperative bearing-only sensors as shown in Figure 1.

The target state, which consists of both target location

(x_{k}, y_{k})

and velocity

({\dot{x}}_{k}, {\dot{y}}_{k})

, at time

t_{k}, k = 1, 2, \dots

is

x_{k} = {[x_{k}, y_{k}, {\dot{x}}_{k}, {\dot{y}}_{k}]}^{'} .

The system and measurement models are

\begin{matrix} x_{k} & = F_{k} x_{k - 1} + w_{k}, \end{matrix}

(1)

\begin{matrix} β_{k} & = [\begin{matrix} β 1_{k} \\ β 2_{k} \end{matrix}] = h (x_{k}) + v_{k}, \end{matrix}

(2)

where the system transition matrix is

F_{k} = [\begin{matrix} 1 & t_{k} - t_{k - 1} \\ 0 & 1 \end{matrix}] ⨂ I_{2},

(3)

and

w_{k}

and

v_{k}

are system and measurement noises, respectively. These are assumed to be independent Gaussian distributions, that is,

ω_{k} \sim N (0, Q_{k}), a n d v_{k} \sim N (0, Σ_{k}),

where

Q_{k}

is assumed to be known and

Σ_{k} = [\begin{matrix} σ_{1}^{2} & 0 \\ 0 & σ_{2}^{2} \end{matrix}] .

σ_{1}

and

σ_{2}

are the standard deviations of measurement noises for sensors 1 and sensor 2, respectively.

The measurement function in (2) is expressed as

h (x_{k}, x_{k}^{S_{1}}, x_{k}^{S_{2}}) = [\begin{matrix} {tan}^{- 1} (\frac{x_{k} - x_{k}^{S_{1}}}{y_{k} - y_{k}^{S_{1}}}) \\ {tan}^{- 1} (\frac{x_{k} - x_{k}^{S_{2}}}{y_{k} - y_{k}^{S_{2}}}) \end{matrix}],

(4)

where

(x_{k}^{S_{1}}, y_{k}^{S_{1}})

and

(x_{k}^{S_{2}}, y_{k}^{S_{2}})

are the locations of sensor 1 and sensor 2 at

t_{k}

, respectively.

The objective of target tracking is to estimate the target posterior probability density function (PDF)

p (x_{k} | β_{1 : k})

based on the bearings-only measurement sequence

β_{1 : k}

up to current time

t_{k}

and the prior PDF

p (x_{k - 1} | β_{1 : k - 1})

. Clearly, the measurement model (4) is sensor state dependent, and so tracking accuracy depends on where the two sensors take measurements (i.e., on their motion). Our goal is to find the optimal future motion for the two sensors within the limitations imposed by the sensor platform dynamics, so as to minimize the future expected track error. This is a difficult problem because of the many sources of uncertainty involved in decision-making: the uncertainty in the current target state estimate as well as the uncertainty in the future target motion and measurements.

A practical way to implement the sensor trajectory optimisation procedure is to assume that sensor platforms move at constant speeds,

v_{1}

and

v_{2}

, respectively, and that they have a small set

U

of admissible course corrections at time

t_{k}

. Each element

u_{k}

of

U

is a pair

u_{k} = {[u_{1, k}, u_{2, k}]}^{'}

, where

u_{i, k}

represents the course correction of the ith sensor at k. This results in the velocities of the two sensors in the local coordinates in the time interval

(t_{k}, t_{k + 1}]

as

{\dot{x}}_{k}^{s_{i}} = v_{i} sin u_{i, k}, {\dot{y}}_{k}^{s_{i}} = v_{i} cos u_{i, k}, i = 1, 2 .

(5)

The assumption of constant speeds between sampling intervals is a standard practice for system analysis of discrete data sampling without loss of optimality. The analysis tends to be optimal at a high sampling rate.

In summary, the underlying target tracking is a POMDP and involves an N-step ahead process of sensor trajectory planning, which computes the optimal action sequence

u_{k + 1 : N}^{*} = {u_{k + 1}^{*}, \dots, u_{k + N}^{*}}

such that the tracking error is minimised at the future time

t_{k + N}

. A generic recursive process of sensor trajectory scheduling and target state estimation at discrete-time

k \geq 1

involves the steps listed in Algorithm 1:

Algorithm 1 Sensor scheduling steps at discrete-time k

1:: Inputs: The posterior at k, $p (x_{k} | β_{1 : k})$ ; the sensor locations $(x_{k}^{s_{i}}, y_{k}^{s_{i}})$ and course corrections $u_{i, k}, i = 1, 2$ .
2:: Propose the set of admissible sensor actions $U_{k + 1 : N}$
3:: Perform scheduling based on an optimality criterion and obtain the optimal sequence of future sensor actions $u_{k + 1 : N}^{*} \in U_{k + 1 : N}$ for a future time period $(t_{k + 1}, t_{N})$ .
4:: Steer each sensor $i = 1, 2$ to the computed location $(x_{k + 1}^{s_{i}}, y_{k + 1}^{s_{i}})$ at speed $v_{i}$ and course correction $u_{i, k}$ .
5:: Take (bearings-only) measurements at time $t_{k + 1}$ and then compute the posterior PDF of target $p (x_{k + 1} | β_{1 : k + 1})$ via a suitable tracker.
6:: Outputs: $p (x_{k + 1} | β_{1 : k + 1})$ ; $(x_{k + 1}^{s_{i}}, y_{k + 1}^{s_{i}})$ and $u_{i, k + 1}, i = 1, 2$ .

3. Reward Functions

The sensor trajectory scheduling is part of the underlying target tracking process, involving the computation of a sequence of future actions (sensor course corrections) based on the current target and sensor states. The key role in this process plays the criterion for measuring the reward of the future action. The reward functions used in the literature are based on the information theoretic measures. One criterion is the mutual information between the posterior probability density of the estimated target state and the measurement sequence, which we seek to maximize. A theoretic description is given in [18] and sub-optimal strategies are discussed in [19]. An alternative criterion is to maximize the information divergence between the prior and posterior probability densities of estimated target state [14]. As the ultimate goal of sensor trajectory optimisation is to improve target tracking accuracy, the posterior Cramer–Rao low bound [20] is also used as a cost function to take the estimator performance into account as well. In addition, we can also maximize the determinant or trace of the FIM [13]. All of these criteria are consistent in the sense that they maximize the Fisher information of the underlying system.

In this work, we investigate two reward functions for the problem at hand, which can be derived analytically: the Expected RID and the Determinant of the FIM. For the purpose of comparison, we also introduce the reward function based on the Trace of the FIM. The goal is to find a reward function which is both computationally efficient and reflects correctly the information change associated with various sensor trajectory hypotheses.

3.1. Expected RID

The Rényi information divergence [21] is the information divergence for probability densities

f_{1} (x)

and

f_{2} (x)

, for

α \geq 0

,

I_{α} (f_{1}, f_{2}) = \frac{1}{α - 1} ln [\int f_{1} {(x)}^{α} f_{2} {(x)}^{α - 1} d x] .

(6)

For the underlying system with one-step ahead sensor scheduling,

f_{1} = p (x_{k + 1} | β_{1 : k + 1})

signifies the posterior density at

k + 1

and

f_{2} = p (x_{k + 1} | β_{1 : k})

signifies the predicted density at

k + 1

, where

β_{k + 1}

is the future measurement at time

k + 1

given by (2). Thus, (6) becomes

\begin{matrix} I_{α} (p (x_{k + 1} | β_{1 : k + 1}), p (x_{k + 1} | β_{1 : k})) \\ = \frac{1}{α - 1} ln [\int p {(x_{k + 1} | β_{1 : k + 1})}^{α} p {(x_{k + 1} | β_{1 : k})}^{α - 1} d x_{k + 1}] . \end{matrix}

(7)

The evaluation of the integral (7) requires the future measurement

β_{k + 1}

, which may be obtained numerically via Monte Carlo simulations [15,16]. A statistical solution in a closed-form can be derived for the linear Gaussian case [14], where the dependence on the future measurement

β_{k + 1}

is removed by taking the expectation w.r.t.

p (β_{k + 1} | β_{1 : k})

.

The linear Gaussian case is described by the transition

x_{k} | x_{k - 1} \sim N (x_{k}; F x_{k - 1}, Q),

(8)

and measurement

y_{k} | x_{k} \sim N (y_{k}; H x_{k}, R) .

(9)

The predicted measurement distribution is then given by

y_{k - 1} | y_{k} \sim N (y_{k + 1}; {\hat{y}}_{k + 1}, S_{k + 1}) .

(10)

where

S_{k + 1} = H P_{k | k - 1} H^{'} + R

.

Assume that the sequence of measurements

y_{1 : k}

has been collected so that the posterior density is given by

p (x_{k} | y_{1 : k}) = N (x_{k}; {\hat{x}}_{k | k}, P_{k | k}),

(11)

where

P_{k | k}

is the updated covariance at k.

For the properties of the Kalman filter, the divergence between the prior PDF and posterior PDF at

k + 1

is given by

\begin{matrix} I_{α} = & \frac{1}{α - 1} ln [\frac{{| 2 π R / α |}^{1 / 2} N (y_{k + 1}; {\hat{y}}_{k + 1}, S_{k + 1} + R (1 - α) / α)}{{| 2 π R |}^{α / 2} N {(y_{k + 1}; {\hat{y}}_{k + 1}, S_{k + 1})}^{α}}] \\ = & \frac{0.5}{α - 1} ln (\frac{{| R |}^{1 - α} {| S_{k + 1} |}^{α}}{| α S_{k + 1} + (1 - α) R |}) \\ + & 0.5 \int {(y_{k + 1} - {\hat{y}}_{k + 1})}^{'} [I - S_{k + 1}^{- 1} {(α R^{- 1} + (1 - α) S_{k + 1}^{- 1})}^{- 1}] \\ \times S_{k + 1}^{- 1} (y_{k + 1} - {\hat{y}}_{k + 1}) N (y_{k + 1}; {\hat{y}}_{k + 1}, S_{k + 1}) d y_{k + 1} . \end{matrix}

(12)

Taking the expectation of (12) with respect to

p (y_{k + 1} | y_{1 : k})

and using the formula

\begin{matrix} \int {(z - m)}^{'} Ω^{- 1} & (z - m) N (z; μ, Σ) d z \\ = {(μ - m)}^{'} Ω^{- 1} (μ - m) + t r (Ω^{- 1} Σ), \end{matrix}

(13)

we obtain the closed-form Expected RID function [22]:

\begin{matrix} E_{y_{k + 1 | 1 : k}} I_{α} = & \frac{1}{2 (α - 1)} ln (\frac{| α Ψ_{k + 1} + I |}{| Ψ_{k + 1} {+ I |}^{α}}) \\ + \frac{1}{2} t r (I - {(α Ψ_{k + 1} + I)}^{- 1}) \end{matrix}

(14)

where

Ψ_{k + 1} = R^{- 1} H P_{k + 1 | k} H^{'}

and

P_{k + 1 | k}

is the covariance matrix associated with the predicted state estimate

{\hat{x}}_{k + 1 | k}

.

For the bearings-only problem described in Section 2, see Equations (1) and (2), the Expected RID can only be approximated via linearisation, i.e.,

H

in (14) is replaced by the Jacobian of (4)

H_{k} = \nabla_{x} {h (x) |}_{x = {\hat{x}}_{k + 1 | k}} .

(15)

Note that

P_{k + 1 | k}

is obtained from the underlying tracker [23] and

R

is replaced by

Σ_{k}

associated with (2).

3.2. Determinant of Fisher Information Matrix

In general, it is non-trivial to derive the FIM or its determinant in a closed-form. For this particular case, the measurement model (4) is independent of target velocity and the resulting FIM is only of rank 2. In consequence, we only need to consider the target position components in derivation of the FIM, and therefore the target state vector

x_{k}

in derivation will denote just the target position components (

{[x_{k}, y_{k}]}^{'}

). A closed form expression for the determinant of the FIM is analytically tractable as follows.

Suppose that the posterior at

k - 1

is

p (x_{k - 1} | β_{1 : k - 1})

. Using the current sensor locations and a proposed action

u_{k}

, one can compute the one-step ahead sensor locations

(x_{k}^{s_{i}}, y_{k}^{s_{i}})

for

i = 1, 2

. The FIM

G_{k}

at time

t_{k}

for the measurement (2) is defined to be

G_{k} \overset{Δ}{=} E {[(\nabla_{x} ln p (β | x)) {(\nabla_{x} ln p (β | x))}^{'}]}_{x = x_{k}},

(16)

where

ln p (β | x)

is the log-likelihood, which is obtained from Equation (2), and

G_{k}

is a

2 \times 2

matrix with entries

G_{k} = [\begin{matrix} g_{11} & g_{12} \\ g_{21} & g_{22} \end{matrix}]

(17)

Under the Gaussian measurement noise assumption, we have

\begin{matrix} g_{i j} = {[\frac{\partial h (x_{k})}{\partial {(x_{k})}_{i}}]}^{T} & Σ_{k}^{- 1} [\frac{\partial h (x_{k})}{\partial {(x_{k})}_{j}}] \\ + \frac{1}{2} t r [Σ_{k}^{- 1} \frac{\partial Σ (x)}{\partial {(x_{k})}_{i}} Σ_{k}^{- 1} \frac{\partial Σ (x)}{\partial {(x_{k})}_{j}}] . \end{matrix}

(18)

Therefore,

\begin{matrix} g_{11} & = \sum_{i = 1}^{2} \frac{1}{σ_{i}^{2}} \frac{{(y_{k} - y_{k}^{s_{i}})}^{2}}{{[{(x_{k} - x_{k}^{s_{i}})}^{2} + {(y_{k} - y_{k}^{s_{i}})}^{2}]}^{2}}, \end{matrix}

(19)

\begin{matrix} g_{12} & = g_{21} = \sum_{i = 1}^{2} - \frac{1}{σ_{i}^{2}} \frac{(x_{k} - x_{k}^{s_{i}}) (y_{k} - y_{k}^{s_{i}})}{{[{(x_{k} - x_{k}^{s_{i}})}^{2} + {(y_{k} - y_{k}^{s_{i}})}^{2}]}^{2}}, \end{matrix}

(20)

\begin{matrix} g_{22} & = \sum_{i = 1}^{2} \frac{1}{σ_{i}^{2}} \frac{{(x_{k} - x_{k}^{s_{i}})}^{2}}{{[{(x_{k} - x_{k}^{s_{i}})}^{2} + {(y_{k} - y_{k}^{s_{i}})}^{2}]}^{2}} . \end{matrix}

(21)

The determinant of

G_{k}

is a function of

σ_{1}, σ_{2}, x_{k}, x_{k}^{s_{1}}

and

x_{k}^{s_{2}}

. From the definition and using (19)–(21), we have

\begin{matrix} D e t {G_{k}} \overset{Δ}{=} \frac{1}{σ_{1}^{2} σ_{2}^{2}} \\ \times \frac{{[(x_{k} - x^{s_{1}}) (y_{k} - y^{s_{2}}) - (x_{k} - x^{s_{2}}) (y_{k} - y^{s_{1}})]}^{2}}{{[{(x_{k} - x^{s_{1}})}^{2} + {(y_{k} - y^{s_{1}})}^{2}]}^{2} {[{(x_{k} - x^{s_{2}})}^{2} + {(y_{k} - y^{s_{2}})}^{2}]}^{2}} . \end{matrix}

(22)

Calculation of the FIM requires knowledge of the target location at future time k, which in practice is approximated using the predicted target state

{\hat{x}}_{k}

based on the point estimate from

p (x_{k - 1} | β_{1 : k - 1})

and the system dynamics (1).

3.3. Trace of Fisher Information Matrix

In view of (19)–(21), the trace of the FIM is given by

\begin{matrix} t r {(G)}_{k} & \overset{Δ}{=} \sum_{i = 1}^{2} g_{i, i} = \sum_{i = 1}^{2} \frac{1}{σ_{i}^{2}} \frac{1}{{(x_{k} - x_{k}^{s_{i}})}^{2} + {(y_{k} - y_{k}^{s_{i}})}^{2}} . \end{matrix}

(23)

The trace of the FIM is a widely used objective function for sensor information maximisation [24], but it is more conservative than the determinant of the FIM since it does not take into account the off-diagonal terms of FIM. It will be used to explain the behaviour of the Expected RID function.

3.4. Remarks on the Reward Functions

Both the Expected RID and the Determinant of FIM involve approximations: the former using linearisation, the latter using the predicted target state in place of the true state. The following analysis highlights the difference between the two reward functions after approximation.

To compare the performance difference of the reward functions, we fix the location of Sensor 1 at

S_{1} = (50, 300)

, and plot the reward of one step ahead versus the location of Sensor 2. In the simulation,

P

is set to be a constant,

σ_{1} = 0.175

(rad) and

σ_{2} = 0.2618

(rad). The contour plots for the Expected RID and Determinant of FIM are in Figure 2a,b, respectively.

In view of these figures, we can make the following observations:

While Figure 2a shows an increasing reward as Sensor 2 approaches the target, there is no information change observed when Sensor 2 moves around the target in a circle. This indicates that the Expected RID changes only with respect to the distance from individual sensors to the target rather than with respect to the relative locations of the two sensors. This is in contrast to the value of the Determinant of FIM shown in Figure 2b.
Figure 2b shows a straight line across the two sensor locations, where the values of FIM are singular. This indicates a subspace corresponding to the singular points on which the target state is unobservable. On the other hand, neither the expected RID nor the trace of the FIM take this singular subspace into account.
We observed that the contour plot of the Expected RID in Figure 2a is virtually the same as that of the Trace of FIM shown in Figure 2c. The latter is independent of the two sensor locations as long as their distances to the target remain unchanged and this is verified by (23). This suggests that the Expected RID does not preserve the observed information structure with respect to the two sensor locations after linearisation.

Note that we fix the location of sensor 1 in Figure 2 for a clear visualization of the difference in comparison, given that both sensors are movable in the sensor trajectory optimisation.

4. Geometric Interpretation on Reward Function

The above argument on the selection of a reward function in cooperative sensing can be described using information geometry. It is well known that the FIM can be used as a metric tensor defining a Riemannian manifold, called the statistical manifold [25]. Riemannian geometrical concepts, thereby, can inform in problems such as ours. For instance, the Fisher Information distance, defined in terms of the Riemannian metric, can be used to compare parameterised distributions.

For the problem at hand, the family of probability distributions

S = {p (β | x)}

, parameterised in the target location space

x \in R^{2}

, forms a 2D statistical manifold where

x

plays the role of a coordinate system of S. The Fisher information distance (FID) between

p (β | x (t_{1}))

and

p (β | x (t_{2}))

is defined as the integral along the curve

x (t)

:

\begin{matrix} D_{F} (x (t_{1}), x (t_{2})) \overset{▵}{=} \end{matrix}

\begin{matrix} min_{x} \int_{t_{1}}^{t_{2}} (\sqrt{{(\frac{d x (t)}{d t})}^{T} G (x (t)) (\frac{d x (t)}{d t})}) d t, \end{matrix}

(24)

where the minimizing curve is a geodesic in the Riemannian manifold. In local coordinates, the geodesic equations are given by the Euler–Lagrange equations as [25]

\frac{d^{2} x_{l}}{d t^{2}} + \sum_{i = 1}^{n} \sum_{j = 1}^{n} Γ_{i j}^{k} \frac{d x_{i}}{d t} \frac{d x_{j}}{d t} = 0, l = 1, 2 . n = 2 .

(25)

For convenience, we use the subscript i of

x

to represent its ith component and thus

x_{i}, i = 1, 2

are the coordinates of the curve

x (t)

,

Γ_{i j}^{l}, i, j = 1, 2

are the Christoffel symbols of the second kind, and we can use the Levi–Civita connection coefficients,

Γ_{i j}^{l} = \frac{1}{2} \sum_{k = 1}^{2} g^{l k} (\frac{\partial g_{i k}}{\partial x_{j}} + \frac{\partial g_{j k}}{\partial x_{i}} - \frac{\partial g_{i j}}{\partial x_{k}}), i, j, l = 1, 2 .

(26)

where

[g^{l k}]

signifies the inverse of

G = [g_{l k}]

and Einstein notation for summation is used.

The geodesic equations in (25) are ordinary differential equations for the coordinates

x_{i}, i = 1, 2

. A unique solution

x (t)

can be found for given initial conditions

x (0)

and

\dot{x}

, which is analogous to an initial position

x (0)

and the “speed”

ν \in T_{x} S

in the sense of the classical mechanics, where

T_{x}

denotes the tangent vector of S at

x

.

Assume that a geodesic is projected onto the parameter space

R^{2}

with a starting point

x (0)

and a tangent vector

ν

. The exponential map of the starting point is then defined as [26]

{exp}_{ν} [x (0)] \overset{▵}{=} Ψ (1; x (0), ν) .

(27)

where the notation

Ψ (t; x (0), ν)

is used to signify a geodesic with a starting point

x (0)

, a tangent vector

ν

and end point

x (t)

.

It can be shown that the length along the geodesic between

x (0)

and

Ψ (1; x (0), ν)

is

| ν |

[26,27]. Thus, for a fixed

ν

, the plot of geodesics along all directions at

x

produces an FID circle centered at

x

in S. Intuitively, the projection of an equal FID circle from statistical manifold to the parameter space, which is generally not isotropic in magnitude, reflects the observability of the underlying sensors at given states. With the FIM corresponding to a sensible reward function, the projection will vary as the state of a sensor changes.

To highlight the difference between the two statistical reward functions in question, we plot the projection of FID circles in parameter space under the scenario where two fixed targets are observed by the two bearing-only sensors. These circles are centred at a target state with the same radius

ν

in the statistical manifold spanned by the FIM corresponding to the two statistical reward functions, respectively.

We examine the changes of FID circles before and after the Sensor 2 move circularly around one target.

The FID circles, shown in Figure 3a,b, have changed significantly as Sensor 2 moves circularly around the target on the left. This indicates that the amount of target information which may be observed by the two sensors varies with sensor states. On the other hand, if we plot the FID circles in the statistical manifold spanned by the FIM without off-diagonal elements which correspond to using the trace of FIM as a statistical reward function, as shown in Figure 4a,b, the FID circle centered at the target on the left does not change as the Sensor 2 moves circularly around the target.

We emphasize that sensor 2 moves close to the target on the right while keeping the same distance to the target on the left. The FID circles on both left and right targets change in Figure 3 under the Determinant of FIM, but the FID circle on the left remains the same as shown in Figure 4 after the movement of sensor 2 when the Trace of FIM criterion is used. This result indicates that the Fisher information will not be fully explored if trace rather than determinant of the FIM is used as a statistical reward function. The Expected RID performs similarly to the trace of FIM.

5. Simulation Example

In this section, we illustrate the effectiveness of sensor scheduling using an example of tracking a maneuvering target by two bearing-only sensors. Target motion follows a hidden Markov process which contains the states of a left turn (

- 5^{\circ}

), a right turn (

5^{\circ}

) and straightline (

0^{\circ}

), and it is moving at a constant speed of 10 m/s. The initial target state is

x_{0} = {[600, 560, 10 * cos (30^{\circ}), 10 * sin (30^{\circ})]}^{'} .

In the first 50 of a total of 100 scans, the target is in the left turn state and it shifts to a right turn state for the rest of scans. Initially, the two bearing-only sensors are moving at a constant speed of 10m/s from the locations

S_{1} = (1900, 560)

m and

S_{2} = (1950, 560)

m heading in the directions

0^{\circ}

and

60^{\circ}

, respectively. They acquire bearing measurements of the target at a sampling rate of T = 5 s. During each scan, they will be steered to one of five directions with respect to their current headings according to the trajectory optimisation decision. The 5 directions are

- 90^{\circ}

,

- 45^{\circ}, 0^{\circ}, 45^{\circ}

and

90^{\circ}

.

Therefore, for a N-Step ahead decision process, the number of action hypotheses yielded for the two cooperative sensors are

{(5^{2})}^{N}

. For example, the number of action hypotheses for

N = 1

is

5^{2} = 25

and for

N = 3

is

{(5^{2})}^{3} = 15, 625

. We illustrate this example in Figure 5.

In practice, the maximum number of hypothesis histories allowed to steer future measurement at each epoch is constrained by a fixed number “MaxH” to provide a feasible computational overhead for real-time operation. In our simulation, we set MaxH = 1000 and such a choice does not affect the comparison of the optimisation performance under different reward functions. We observed from simulation that under this constraint the track error difference between

N > 3

and

N = 3

is statistically close to zero and thus is negligible. Therefore, in the performance comparison versus Monte Carlo runs, we only consider

N \leq 3

.

An EKF tracker [28] is implemented to estimate the posterior density of target state from the target bearing measurements taken by the two sensors. We assume that the target bearing measurements are corrupted with a Gaussian noise of zero-mean with standard deviation of two degrees.

Figure 6 and Figure 7 show the typical trajectories of the two bearing-only sensors in the target tracking experiment under the Determinant of FIM (DetFIM), the Expected RID (ExpReward), and the Trace of FIM (TrFIM), respectively. By maximising DetFIM, the two sensors move in a way such that the angle formed by sensor 1, target, sensor 2 is approximately

π / 2

while approaching the target. Thus, the measurements taken by the two sensors along these computed trajectories minimise the error covariance of the underlying tracker (Figure 6). On the other hand, the sensor trajectories scheduled by maximising ExpReward (or TrFIM) show no cooperative movement between the sensors while they are approaching the target. This is because none of correlations between the two sensors with respect to the target are used in the evaluation of either ExpReward or TrFIM criterion as discussed in Section 3. By maximizing either of these two reward functions, the two passive sensors move independently (at identical speeds) in the way to minimize their own distances-to-target (Figure 7), which clearly yields larger errors than that of the DetFIM result shown in Figure 6.

The statistical results averaged over 100 Monte Carlo runs for each case are shown in Figure 7, where the root-mean-squared position error comparison of the tracker under different reward functions for

N = 2

is presented in Figure 8a, and the RMS errors under DetFIM for

N = 1, 2, 3, 4

are in Figure 8b. In summary, the simulation results demonstrate that

the tracker yields a smaller error but significantly more computational overhead as N increases. We observed that the computational complexity for $N = 1, 2, 3, 4$ is roughly at a ratio of $1 : 41 : 2890 : 2905$ when MaxH = 1000. Both computational overhead and RMS errors are bounded by MaxH, which is the maximum number of sensor trajectory hypotheses to be maintained. As shown in Figure 8b, the RMS error performances for $N \geq 3$ are almost identical. This reflects the trade-off between error performance and computational load as the limited MaxH leads information loss due to the hypotheses pruning process:
the sensor scheduling under DetFIM yields a significantly small error, which is consistent with our analysis.
the RMS error under ExpReward is quite similar to that under TrFIM. The latter completely ignores the correlation between the two sensor states.
The major computational complexity of the simulation comes from the computation of N-Step ahead decision process when $N > 2$ . For example, the average computational overhead per scan are 0.7 s when $N = 2$ and 5.5 s for $N = 3$ (on a machine with an Intel 2 Core i7-4600U CPU 2.10GHz processor (Intel: Mountain View, CA, USA)). The computational complexity difference between DetFIM, TrFIM and ExpReward are $1 : 1.02 : 1.45$ .

6. Further Discussion

From the analysis in Section 3 and Section 4 and simulation results in Section 5, we may conclude the following points, which highlight the outcome from the major contribution of this paper.

The measurement Equation (4) suggests that the optimisation of sensor trajectories is required to reduce state estimation error (or maximize FIM) of the underlying system.
The three closed-form reward functions derived by approximation are applicable for the optimisation aiming to maximize Fisher information. Identical performances may be achievable when (4) describes a single sensor case ( $S_{1} = S_{2}$ ).
Only the determinant of FIM should be used for the case where cooperative sensing from multiple sensors is involved.
Since the target state in the FIM calculation can only be approximated by the estimate from tracker, the reward calculated by DetFIM can be senseless if the track error is significantly large.
In the N-Step ahead sensor trajectory optimisation process described in this paper, computational load can become intractable if a large number of sensors are jointly considered or a large N is chosen. Researchers are investigating other solutions, such as the Rapid Random Exploring Tree numerical sampling method [29], in order to address the computation issue.

7. Conclusions

In this paper, we investigated the problem of sensor trajectory optimisation for tracking a moving target using multiple cooperative passive sensors. The problem is formulated as POMDP, and a N-step ahead sensor trajectory scheduling is considered to steer the measurement decision process that maximises a reward function. Two statistical reward functions, the Expected RID and the determinant of FIM, both of which are derived in closed-forms, are considered. We show that the Expected RID function approximated through linearisation effectively yields a much weaker correlation between the two sensor states and thus the trajectory scheduling performance using it is as poor as using the Trace of FIM. On the other hand, the Determinant of FIM is an appropriate statistical reward function to be used. This conclusion is confirmed by the simulation results presented. Although this work is based on tracking a moving target by two bearing-only sensors scenario, the results can also be applied to other types of sensors which are required to be used jointly to observe the underlying target state, such as Doppler radars. The underlying application areas may consist of those whose decision-making performances depend on the cooperation of multiple dependent sensors such as smart cites, energy and robotics [30,31,32] and practice of game theory [33].

Author Contributions

Conceptualization, X.W. and B.H.; methodology, X.W. and B.R.; validation, X.W., B.R. and B.M.; formal analysis, X.W.; investigation, X.W.; writing—original draft preparation, X.W.; writing—review and editing, X.W., B.R., B.H. and B.M.; supervision, B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Asian Office of Aerospace Research & Development (AOARD)/AFRL under Grant No. FA2386-15-1-4066 and Australia Research Council under Grant No. ARC DP130103582.

Acknowledgments

We thank Sanjeev Arulampalam from the Defence, Science, Technology Group, Australia for his helpful comments on the simulation study of this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Musicki, D. Multi-target tracking using multiple passive bearings-only asynchronous sensors. IEEE Trans. Aerosp. Electron. Syst. 2008, 44, 1151–1160. [Google Scholar] [CrossRef]
Arulampalam, M.S.; Ristic, B.; Gordon, N.; Mansell, T. Bearings-only tracking of manoeuvring targets using particle filters. EURASIP J. Adv. Signal Process. 2004, 2004, 1–15. [Google Scholar] [CrossRef] [Green Version]
Singh, S.S.; Kantas, N.; Vo, B.N.; Doucet, A.; Evans, R.J. Simulation-based optimal sensor scheduling with application to observer trajectory planning. Automatica 2007, 43, 817–830. [Google Scholar] [CrossRef] [Green Version]
Hernandez, M.L. Optimal sensor trajectories in bearings-only tracking. In Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden, 28 June–1 July 2004. [Google Scholar]
Krishnamurthy, V. Algorithms for Optimal Scheduling and Management of Hidden Markov Model Sensors. IEEE Trans. Signal Process. 2002, 50, 1382–1397. [Google Scholar] [CrossRef] [Green Version]
Cadre, J.-P.L.; Trémois, O. Optimization of the Observer Motion Using Dynamic Programming. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 9–12 May 1995; Volume 5, pp. 3567–3570. [Google Scholar]
Fawcett, J.A. Effect of course maneuvers on bearings-only range estimation. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 1193–1199. [Google Scholar] [CrossRef]
Cadre, J.-P.L.; Laurent-Michel, S. Optimizing the receiver maneuvers for bearings-only tracking. Automatica 1999, 35, 591–606. [Google Scholar] [CrossRef]
Helferty, J.P.; Mdgett, D.R.; Dzidlski, J.E. Trajectory optimisation for minimum range error in bearings-only source localization. In Proceedings of the Oceans’ 93 Engineering in Harmony with Ocean, Victoria, BC, Canada, 18–21 October 1993; pp. 229–234. [Google Scholar]
Logothetis, A.; Krishnamurthy, V.; Holst, J.; Isaksson, A. Modal State Estimation of a Maneuvering Target in Clutter. In Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, CA, USA, 12 December 1997; pp. 5024–5029. [Google Scholar]
Logothetis, A.; Krishnamurthy, V. Modal Trajectory Estimation for Jump Markov Linear Systems via the Expectation Maximization Algorithm. In Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, CA, USA, 12 December 1997; pp. 1700–1705. [Google Scholar]
Ghassemi, F.; Krishnamurthy, V. A method for constructing the observer trajectory in bearins-only tracking of targets with a Markovian model. In Proceedings of the 2005 IEEE International Conference on Information Acquisition, Hong Kong, China, 27 June–3 July 2005. [Google Scholar]
Oshman, Y.; Davidson, P. Optimization of observer trajectories for bearings-only target localization. IEEE Trans. Aerosp. Electron. Syst. 1999, 35, 892–902. [Google Scholar] [CrossRef]
Hanselmann, T.; Morelande, M.; Moran, B.; Sarunic, P. Sensor scheduling for multiple target tracking and detection using passive measurements. In Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 30 June–3 July 2008; pp. 1528–1535. [Google Scholar]
Ristic, B.; Vo, B.N. Sensor control for multi-object state-space estimation using random finite sets. Automatica 2010, 46, 1812–1818. [Google Scholar] [CrossRef]
Beard, M.; Vo, B.T.; Vo, B.N.; Arulampalam, S. Sensor control for multi-target tracking using cauchy-schwarz divergence. In Proceedings of the 18th International Conference on Information Fusion, Washington, DC, USA, 6–9 July 2015; pp. 937–944. [Google Scholar]
Wang, X.; Ristic, B.; Himed, B.; Moran, B. Joint passive sensor scheduling for target tracking. In Proceedings of the 20th International Conference on Information Fusion, Xi’an, China, 10–13 July 2017; pp. 1–7. [Google Scholar]
Logothetis, A.; Isaksson, A.; Evans, R.J. An information theoretic approach to observer path design for bearings-only tracking. In Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, CA, USA, 12 December 1997; pp. 3132–3237. [Google Scholar]
Logothetis, A.; Isaksson, A.; Evans, R.J. Comparison of suboptimal strategies for optimal own-ship maneuvers in bearings-only tracking. In Proceedings of the American Control Conference, Philadelphia, PA, USA, 26 June 1998; pp. 3334–3338. [Google Scholar]
Hernandez, M.L.; Farina, A.; Ristic, B. PCRLB for tracking in cluttered environments: Measurement sequence conditioning approach. IEEE Trans. Aerosp. Electron. Syst. 2006, 42, 680–704. [Google Scholar] [CrossRef]
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, Berkeley, CA, USA, 20–30 July 1960; pp. 547–561. [Google Scholar]
Wang, X.; Morelande, M.; Moran, B. Sensor Scheduling for Bearings-Only Tracking with A Single Sensor. In Proceedings of the Fifth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP 2009), Melbourne, VIC, Australia, 7–10 December 2009; pp. 67–72. [Google Scholar]
Bar-Shalom, Y.; Fortmann, T.E. Tracking and Data Association; Academic Press: Cambridge, MA, USA, 1988. [Google Scholar]
Passerieux, J.M.; Cappel, D.V. Optimal observer maneuver for bearings-only tracking. IEEE Trans. Aerosp. Electron. Syst. 1998, 34, 777–788. [Google Scholar] [CrossRef]
Abdel-All, N.H.; Abd-Ellah, H.N.; Moustafa, H.M. Information geometry and statistical manifold. Chaos Solitons Fractals 2003, 15, 161–172. [Google Scholar] [CrossRef]
Petersen, P. Riemannian Geometry; Springer: New York, NY, USA, 1998. [Google Scholar]
Yang, Z.; Laaksonen, J. Principal whitened gradient form information geometry. Neural Netw. 2008, 21, 232–240. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Morelande, M.; Moran, B. Target Motion Analysis Using Single Sensor Bearings-Only Measurements. In Proceedings of the 2nd International Conference on Image Processing and Signal Processing (CISP’09), Tianjin, China, 17–19 October 2009; pp. 2094–2099. [Google Scholar]
Wang, X.; Williams, S.; Angley, D.; Gilliam, C.; Ellem, T.J.R.; Bessell, A.; Moran, B. RRT* Trajectory Scheduling Using Angles-Only Measurements for AUV Recovery. In Proceedings of the 22nd International Conference on Information Fusion (FUSION 2019), Ottawa, ON, Canada, 2–5 July 2019. [Google Scholar]
Leccese, F.; Cagnetti, M.; Trinca, D. A smart city application: A fully controlled street lighting isle based on Raspberry-Pi card, a ZigBee sensor network and WiMAX. Sensors 2014, 14, 24408–24424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Doyle, F.; Duarte, M.J.R.; Cosgrove, J. Design of an Embedded Sensor Network for Application in Energy Monitoring of Commercial and Industrial Facilities. Energy Procedia 2015, 83, 504–514. [Google Scholar] [CrossRef] [Green Version]
Foumani, M.; Smith-Miles, K.; Gunawan, I. Scheduling of two-machine robotic rework cells: In-process, post-process and in-line inspection scenarios. Robot. Auton. Syst. 2017, 91, 210–225. [Google Scholar] [CrossRef]
Yang, Y.; Moran, B.; Wang, X.; Brown, T.C.; Williams, S.; Pan, Q. Experimental analysis of a game-theoretic formulation of target tracking. Automatica 2020, 114, 108793. [Google Scholar] [CrossRef]

Figure 1. Illustration of the bearings-only tracking scenario, where the measurement covariance depends on the locations of the two sensors.

Figure 2. Reward function performance comparison—Normalised values of reward functions for the location of sensor 2 when the locations of sensor 1 and target are fixed. (a) expected RID; (b) determinant of FIM; (c) trace of FIM. The normalised value of colorbar indicates the relative amount of Fisher information gained by the sensor system.

Figure 3. Illustration of the circles of statistical manifold spanned by FIM for given sensor locations (a) sensor 1 is at (1500, 400) and sensor 2 (973, 387); (b) sensor 1 is at (1500, 400) and sensor 2 moved to (1219, 544) in a circle around the target on the left. Colorbar value represents the changes of the amount of information gained by the sensor system if the target moves between the center and a manifold circle boundary.

Figure 4. Illustration of the circles of statistical manifold spanned by the FIM with constant valued off-diagonal elements for given sensor locations (a) sensor 1 is at (1500, 400) and sensor 2 (973, 387); (b) sensor 1 is at (1500, 400) and sensor 2 moved to (1219, 544) in a circle around the target on the left. Colorbar value represents the changes of the amount of information gained by the sensor system if the target moves between the center and a manifold circle boundary.

Figure 5. Illustration of sensor trajectory hypothesis trees under a joint sensor scheduling. For

N = 3

steps ahead calculation at time k, two sensors with five possible directions yields a total of 15,625 possible trajectory hypotheses. The color scheme signifies the accumulated reward values (under DetFIM) with red the highest and dark blue the lowest.

Figure 5. Illustration of sensor trajectory hypothesis trees under a joint sensor scheduling. For

N = 3

steps ahead calculation at time k, two sensors with five possible directions yields a total of 15,625 possible trajectory hypotheses. The color scheme signifies the accumulated reward values (under DetFIM) with red the highest and dark blue the lowest.

Figure 6. Illustration of the estimated target trajectory and the motion trajectories of the two bearing-only sensors optimized under DetFIM plotted from a single run, along with target ground truth trajectory. Colorbar value signifies sampling time (scan).

Figure 7. Estimated target trajectories with ground truth trajectory and the motion trajectories of the two bearing-only sensors optimized under (a) ExpReward; (b) TrFIM, plotted from a single run. Colorbar value signifies sampling time (scan).

Figure 8. (a) RMS errors of the estimated target trajectory with sensor trajectories optimized under different reward methods for a

N = 2

-step ahead decision process; (b) RMS errors of the estimated target trajectory with sensor trajectories optimized under DetFIM for

N = 1, 2, 3, 4

. Statistically, there is little difference between

N = 3

and

N = 4

due to the bound on the number of hypotheses kept at each step by MaxH.

Figure 8. (a) RMS errors of the estimated target trajectory with sensor trajectories optimized under different reward methods for a

N = 2

-step ahead decision process; (b) RMS errors of the estimated target trajectory with sensor trajectories optimized under DetFIM for

N = 1, 2, 3, 4

. Statistically, there is little difference between

N = 3

and

N = 4

due to the bound on the number of hypotheses kept at each step by MaxH.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Ristic, B.; Himed, B.; Moran, B. Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors. Signals 2021, 2, 174-188. https://doi.org/10.3390/signals2020014

AMA Style

Wang X, Ristic B, Himed B, Moran B. Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors. Signals. 2021; 2(2):174-188. https://doi.org/10.3390/signals2020014

Chicago/Turabian Style

Wang, Xuezhi, Branko Ristic, Braham Himed, and Bill Moran. 2021. "Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors" Signals 2, no. 2: 174-188. https://doi.org/10.3390/signals2020014

APA Style

Wang, X., Ristic, B., Himed, B., & Moran, B. (2021). Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors. Signals, 2(2), 174-188. https://doi.org/10.3390/signals2020014

Article Menu

Trajectory Optimisation for Cooperative Target Tracking with Passive Mobile Sensors

Abstract

1. Introduction

2. Problem Definition

3. Reward Functions

3.1. Expected RID

3.2. Determinant of Fisher Information Matrix

3.3. Trace of Fisher Information Matrix

3.4. Remarks on the Reward Functions

4. Geometric Interpretation on Reward Function

5. Simulation Example

6. Further Discussion

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI