The Information Geometry of Sensor Configuration

In problems of parameter estimation from sensor data, the Fisher information provides a measure of the performance of the sensor; effectively, in an infinitesimal sense, how much information about the parameters can be obtained from the measurements. From the geometric viewpoint, it is a Riemannian metric on the manifold of parameters of the observed system. In this paper, we consider the case of parameterized sensors and answer the question, “How best to reconfigure a sensor (vary the parameters of the sensor) to optimize the information collected?” A change in the sensor parameters results in a corresponding change to the metric. We show that the change in information due to reconfiguration exactly corresponds to the natural metric on the infinite-dimensional space of Riemannian metrics on the parameter manifold, restricted to finite-dimensional sub-manifold determined by the sensor parameters. The distance measure on this configuration manifold is shown to provide optimal, dynamic sensor reconfiguration based on an information criterion. Geodesics on the configuration manifold are shown to optimize the information gain but only if the change is made at a certain rate. An example of configuring two bearings-only sensors to optimally locate a target is developed in detail to illustrate the mathematical machinery, with Fast Marching methods employed to efficiently calculate the geodesics and illustrate the practicality of using this approach.


Introduction
This paper is an attempt to initiate the construction of an abstract theory of sensor management, in the hope that it will help to provide both a theoretical underpinning for the solution of practical problems and insights for future work. A key component of sensor management is the amount of information a sensor in a given configuration can gain from a measurement, and how that information gain changes with the configuration of the sensor. In this vein, it is interesting to observe how information theoretic surrogates have been used in a range of applications as objective functions for sensor management; see, for instance, [1][2][3]. Our aim here is to abstract from various papers including these, those by Kershaw and Evans [4] on sensor management, as well as others, the mathematical principles required for this theory. Our approach is set within the mathematical context of differential geometry and we aim to show that geodesics on a particular manifold provide the theoretical best-possible sensor reconfiguration in the same way that the Cramer-Rao lower bound provides a minimum variance for an estimator.
The problem of estimation is expressed in terms of a likelihood; that is, a probability density p(x|θ), where x denotes the measurement and θ the parameter to be estimated. It is well known [5] that the Fisher information associated with this likelihood provides a measure of information gained from the measurement. This in turn leads to the Fisher-Rao metric [6,7] on the parameter space. This Riemannian metric lies at the heart of our discussion.
While the concepts discussed here are, in some sense, generic and can be applied to any sensor system that has the capability to modify its characteristics, for simplicity and to keep in mind a motivating example, we focus on a particular problem: that of localization of a target. Of course, a sensor system is itself just a (more complex) aggregate sensor, but it will be convenient, for the particular problems we will discuss, to assume a discrete collection of disparate (at least in terms of location) sensors, that together provide measurements of aspects of the location of the target. This distributed collection of sensors, each drawing measurements that provide partial information about the location of a target, using known likelihoods, defines a particular sensor configuration state. As the individual sensors move, they change their sensing characteristics and thereby the collective Fisher information associated with estimation of target location. As already stated, the Fisher information matrix defines a metric, the Fisher-Rao metric, over the physical space where the target resides, or in more generality a metric over the parameter space in which the estimation process takes place, and this metric is a function of the location of the sensors. This observation permits analysis of the information content of the system, as a function of sensor parameters, in the framework of differential geometry ("information geometry") [8][9][10][11]. A considerable literature is dedicated to the problem of optimizing the configuration so as to maximize information retrieval [1,[12][13][14]. The mathematical machinery of information geometry has led to advances in several signal processing problems, such as blind source separation [15], gravitational wave parameter estimation [16], and dimensionality reduction for image retrieval [17] or shape analysis [18].
In sensor management/adaptivity applications, the performance of the sensor configuration (in terms of some function of the Fisher information) becomes a cost associated with finding the optimal sensor configuration, and tuning the metric by changing the configuration serves to optimize this cost. Literally hundreds of papers, going back to the seminal work of [4] and perhaps beyond, have used the Fisher information as a measure of sensor performance. In this context, parametrized families of Fisher-Rao metrics arise (e.g., [19,20]). Sensor management then becomes one of choosing an optimal metric (based on the value of the estimated parameter), from among a family of such, to permit the acquisition of the maximum amount of information about that parameter.
As we have stated, the focus, hopefully clarifying, example of this paper, is that of estimating the location of a target using measurements from mobile sensors (cf. [21,22]). The information content of the system depends both on the location of the target and on the spatial locations of the sensors, because the covariance of measurements is sensitive to the distances and angles made between the sensors and the target. As the sensors move in space, the associated likelihoods vary, as do the resulting Fisher matrices, describing the information content of the system, for every possible sensor arrangement. It is this interaction between sensors and target that this paper sets out to elucidate in from an information geometric viewpoint.
Perhaps the key extra idea in this paper is the observation by Gil-Medrano and Michor that the collection of all Riemannian metrics on a Riemannian manifold itself admits the structure of an infinite-dimensional Riemannian manifold [23,24]. Of interest to us is only the subset of Riemannian metrics corresponding to Fisher informations of available sensor configurations, and this allows us to restrict attention to a finite-dimensional sub-manifold of the manifold of metrics, called the sensor manifold [12,13]. In particular, a continuous adjustment of the sensor configuration, say by moving one of the individual sensors, results in a continuous change in the associated Fisher metric and so a movement in the sensor manifold.
Though computationally difficult, the idea of regarding the Fisher-Rao metric as a measure of performance of a given sensor configuration and then understanding variation in sensor configuration in terms of the manifold of such metrics is a powerful concept. It permits questions concerning optimal target trajectories, as discussed here, to minimize information passed to the sensors and, as will be discussed in a subsequent paper, optimal sensor trajectories to maximize information gleaned about the target. In particular, we remark that the metric on the space of Riemannian metrics that appears naturally in a mathematical context in [23,24] also has a natural interpretation in a statistical context.
Our aims here are to further develop the information geometric view of sensor configuration begun in [12,13]. While the systems discussed are simple and narrow in focus, already they point to concepts of information collection that appear to be new. Specifically, we setup the target location problem in an information geometric context and show that the optimal (in a manner to be made precise in Section 3) sensor trajectories, in a physical sense, are determined by solving the geodesic equations on the configuration manifold (Section 4). Various properties of geodesics on this space are derived, and the mathematical machinery is demonstrated using concrete physical examples (Section 5).

Divergences and Metrics
Here, we discuss some generalities on statistical divergences and metrics. The two divergences of interest to us are the Kullback-Leibler divergence and the Jensen-Shannon divergence. These are defined by where p and q are two probability densities on some underlying measure space. In our case, the densities p, q are taken from a single parametrized family p ( θ), where θ is in a manifold M. More specifically, the parameter θ = θ, the location of the target. A justification for the Kullback-Leibler divergence is that, in this context, given a measurement x of a target at θ, the likelihood that the same measurement could be obtained from a target at θ , L(θ, θ ), is given by the log odds expression L(θ, θ ) = log p(x|θ) p(x|θ ) , and the average over all measurements is, by definition, the Kullback-Leibler divergence [25], D KL (θ θ ). Each of the Kullback-Leibler and Jensen-Shannon divergences is regarded as a measure of information difference between the two densities. They are both non-negative and share the property that D(p q) = 0 implies that p = q almost everywhere. The Kullback-Leibler divergence is related to mutual information, as is, somewhat more circuitously the Jensen-Shannon divergence. We refer the reader to Section 17.1 of [5] for a discussion of this connection for the Kullback-Leibler, and to [26] for a brief discussion of the Jensen-Shannon divergence.
Importantly for us, as θ → θ, the first non-zero term in the series expansion is second order, viz. lim where g = g(θ) is an n × n symmetric matrix, and n is the dimension of the manifold M. This location-dependent matrix defines a Riemannian metric over M, the Fisher information metric [6,7]. It can also be calculated, under mild conditions, as the expectation of the tensor product of the gradients of the log-likelihood = log p(x|θ) as This is an example of a more general structure encapsulated by the following: The relationship between this metric and information is explored in the next section. A proof of the theorem appears in the Appendix A.

The Information in Sensor Measurements
Sensor measurements, as considered in this paper, can be formulated as follows. Suppose we have, in a fixed manifold M, a collection of N sensors located at λ i (i = 1, . . . , N). For instance, the manifold may be R 3 and location may just mean that in the usual sense in Euclidean space. The measurements from these sensors are used to estimate the location of a target θ also in M (see the left of Figure 1). Each sensor draws measurements x i from a distribution with probability density function (PDF) p i (x i |θ, ν i ), where θ represents the state of the system we are trying to estimate and ν i the noise parameters of each individual sensor. M is almost certainly not topologically spherical, it is merely drawn here as such for simplicity.
is the collected set of individual measurement from each of the sensors with likelihood These measurements are assumed independent between sensors and over time when dynamics are included. While this assumption is probably not necessary, it allows one to define the aggregate likelihood (5) as a simple product over the individual likelihoods, which renders the problem computationally more tractable. Since Fisher information is additive over independent measurements, the Fisher information metric provides a measure of the instantaneous change in information the sensors can obtain about the target. In this paper, we adopt a relatively simplistic view that the continuous collection of measurements as exemplified by, for instance, the Kalman-Bucy filter [27], is a limit of measurements discretized over time. Because sensor measurements depend on the relative locations of the sensors and target, this incremental change depends on the direction the target is moving; the Fisher metric (4) can naturally be expressed in coordinates that represent the sensor locations (see Section 4), but also depends on parameters that represent target location, which may be functions of time in a dynamical situation. Once the Fisher metric (4) has been evaluated, one can proceed to optimize it in an appropriate manner.
Another divergence will be of interest in our discussions. This is the Jensen-Shannon divergence, defined in terms of the Kullback-Leibler divergence by

D-Optimality
The information of the sensor system described in Section 3 is a matrix-valued function and so problems of optimization of 'total information' with respect to the sensor parameters are not well defined. We require a definition of 'optimal' in an information theoretic context. Several different optimization criteria exist (e.g., [28,29]), achieved by constructing scalar invariants from the matrix entries of g, and maximizing those functions in the usual way.
We adopt the notion of D-optimality in this paper; we consider the maximization of the determinant of (4). Equivalently, D-optimality maximizes the differential Shannon entropy of the system with respect to the sensor parameters [30], and minimizes the volume of the elliptical confidence regions for the sensors estimate of the location of the target θ [31].
While we have focused so far on the metric as a function of target location θ on the manifold, it is also clear that sensor parameters are involved. In applying D-optimality (or any other) criterion to this problem, we take account of the possibility that sensor locations (in the context we have described) and distributions are not fixed. Conventionally, measurements are drawn from sensors with fixed properties, with a view to estimating a parameter θ. Permitting sensors to move throughout M produces an infinite family of sensor configurations, and hence Fisher-Rao metrics (4), parametrized by the locations of the sensors. One aim of this structure is to move sensors to locations that serve to maximize information content, given some prior distribution for a particular θ ∈ M. This necessitates a tool to measure the difference between information provided by members of a family of Fisher-Rao metrics; this is explored in Section 4. We emphasize that the ideas described here will extend to situations where the sensor manifold is not identified with a copy (or several copies) of the target manifold.

Geodesics on the Sensor Manifold
We now consider the case where the target is in motion, so that θ varies along a path γ(t) ⊂ M. The instantaneous information gain by the sensor(s) at time t is then g(γ (t), γ (t)), where g is the Fisher information metric (4). We define the information provided to the sensors about the target by the measurements as the target moves along the curve as follows. Take a δ-partition , a measurement is taken represented by the likelihood p(x i |γ(t i )). Over the interval (t i+1 , t i ], the increment in information could be D(γ(t i ||γ(t i+1 )/(t i+1 − t i ) but this does not make sense for an increment as it is not symmetric. Instead, we should calculate the information gain from the endpoints of the interval to the mean distribution This is also know as the Jensen-Shannon divergence, D JS (X||Y). Based on the assumption that measurements taken over time are all independent, as the target traverses a curve γ, the total information I T gained measuring at the δ-partition, T is Letting δ go to zero gives us an information functional, I(T), defined on paths, γ, which is the equivalent of the energy functional in differential geometry (e.g., Chapter 9 of [32]) This has the same extremal paths as l g (γ), the arc-length of the path γ, Paths with extremal values of this length are geodesics and these can be interpreted as the evasive action that can be taken by the target to minimize amount of information it gives to the sensors.

Kinematic Conditions on Information Management
While the curves that are extrema of the information functional and of arc-length are the same as sets, a geodesic only minimizes the information functional if traversed at a specific speed, namely, In differential geometric terms, this is equivalent to requiring the arc-length parameterization of the geodesic to fulfill the energy condition. In order to minimize information about its location, the target should move along a geodesic of g at exactly the speed dl g /dt (10). This quantifies directly the relation between the speed of the target and the ability to identify its location. While aspects of this speed constraint are still unclear to us, an analogy that may be useful is to regard the target as moving though a "tensorial information fluid" whose flow is given by the information metric. In this setting, moving slower relative to the fluid will result in "pressure" building behind, requiring information (energy) to be expended to maintain the slower speed. Moving faster also requires more information to push through the slower moving fluid. This information is directly available to the sensors resulting in improved target position estimate.
In the fluid dynamics analogy, the energy expended in moving though a fluid is proportional to the square of the difference in speed between the fluid and the object. The local energy is proportional to the difference between actual speed and the speed desired by the geodesic; that is, the speed that minimizes the energy functional. Pursuing the fluid dynamics analogy, the density of the fluid mediates the relationship between the energy and the relative speed.
E ∝ g(δv, δv) In particular, the scalar curvature, which depends on G, influences the energy and hence the information flow. We will explore this issue further in a future publication.

The Information of Sensor Configurations
A sensor configuration is a set Γ = {λ i } N i=1 of sensor parameters. The Fisher-Rao metric g can be viewed as a function of Γ as well as θ, the location of the target. To calculate the likelihood that a measurement came from one configuration Γ 0 over another Γ 1 requires the calculation of p(x|Γ 0 , θ)/p(x|Γ 1 , θ), which is difficult as the value of θ is not known exactly. Measurements can be used to construct an estimateθ. However, the distribution of this estimate is hard to quantify and even harder to calculate. Instead, here, the maximum entropy distribution is used. This is normally distributed with meanθ, and covariance g −1 (θ), the inverse of the Fisher information metric at the estimated target location.
The information gain due to the sensor configuration Γ is now D(p(x|Γ,θ)||1) because there was no prior information about the location before the sensors were configured compared with the maximum entropy distribution p after. Note that the uniform distribution, 1 is, in general, an improper prior in the Bayesian sense unless the manifold M is of finite volume. It may be necessary, therefore, to restrict attention to some (compact) submanifold Ω ⊂ M for D(p(x|Γ,θ)||1) to be well defined (cf. the discussion below Equation (12)).

Evaluating this information gain gives
The Fisher information metric G for this divergence can be calculated from where h and k are tangent vectors to the space of metrics. The integral defining (12) may not converge for non-compact M, so restriction to a compact submanifold Ω of M is assumed throughout as necessary (cf. Figure 1).

The Manifold of Riemannian Metrics
The set of all Riemannian metrics over a manifold M can itself be imbued with the structure of an infinite-dimensional Riemannian manifold [23,24], which we call M. Points of M are Riemmanian metrics on M; i.e., each point G ∈ M bijectively corresponds to a positive-definite, symmetric (0, 2)-tensor in the space S 2 + T M. Under reasonable assumptions, an L 2 metric on M [23,33] may be defined as: which should be compared to (12).
It should be noted that the points of the manifold M comprise all of the metrics that can be put on M, most of which are irrelevant for our physical sensor management problem. We restrict consideration to a sub-manifold of M consisting only of those Riemannian metrics that are members of the family of Fisher information matrices (4) corresponding to feasible sensor configurations. This particular sub-manifold is called the 'sensor' or 'configuration' manifold [12,13] and is denoted by M(Γ), where now the objects h and k are now elements of the now finite-dimensional tangent space TM(Γ). The dimension of M(Γ) is N × dim(M) since each point of M(Γ) is uniquely described by the locations of the N-sensors, each of which require dim(M) numbers to denote their coordinates. A visual description of these spaces is given in Figure 1. For all cases considered in this paper, the integral defined in (13) is well defined and converges (see, however, the discussion in [21]).
For the purposes of computation, it is convenient to have an expression for the metric tensor components of (13) in some local coordinate system. In particular, in a given coordinate basis z i over M(Γ) (not to be confused with the coordinates on Ω; see Secion 4), the metric (13) reads G(h, k) = Ω g nk g m h mn k k vol(g).
where h and k are tangents vectors in TM(Γ) given in coordinates by From the explicit construction (14), all curvature quantities of M(Γ), such as the Riemann tensor and Christoffel symbols, can be computed.

D-Optimal Configurations
D-optimality in the context of the sensor manifold described above is discussed in this section. Suppose that the sensors are arranged in some arbitrary configuration Γ 0 . The sensors now move in anticipation of target behaviour; a prior distribution is adopted to localize a target position θ. The sensors move continuously to a new configuration Γ 1 , where Γ 1 is determined by maximizing the determinant of G, i.e., Γ 1 corresponds to the sensor locations for which det(G), computed from (14), is maximized. The physical situation is depicted graphically in Figure 2. This process can naturally be extended to the case where real measurement data are used. In particular, as measurements are drawn, a (continuously updated) posterior distribution for θ becomes available, and this can be used to update the Fisher metric (and hence the metric G) to define a sequence Γ t of optimal configurations; see Section 5.
Graphical illustration of D-optimal sensor dynamics; a sensor configuration Γ 0 evolves to a new configuration Γ 1 by moving the N sensors in Ω-space to new positions that are determined by maximizing the determinant of the metric G, given by Equation (14), on the sensor manifold M(Γ). Each sensor λ i traverses a path γ i through Ω-space to end up in the appropriate positions constituting Γ 1 . As shown in Section 4.3, the paths γ i are entropy minimizing if they are geodesic on the sensor manifold M(Γ). Note that the target is shown as stationary in this particular illustration.

Geodesics for the Configuration Manifold
While D-optimality allows us to determine where the sensors should move given some prior, it provides us with no guidance on which path the sensors should traverse, as they move through Ω, to reach their new, optimized positions.
A path Υ(t) from one configuration Γ 0 to another Γ 1 is a set of paths for each sensor S i from location λ 0 i to λ 1 i . Varying the sensor locations is equivalent to varying the metric g(t) = g(Γ(t),θ(t)) and the estimate of the target locationθ. The information gain along Υ, I(Υ), is then and the extremal paths are again the geodesics of the metric G, by analogy with (8). Also the speed constraint observed earlier in Section 3. Section 3.3 for the sensor geodesics is in place here and given by dl G dt = G g(t) (g (t), g (t)). (17) Again, this leads to the conclusion that there are kinematic constraints on the rate of change of sensor parameters that lead to the collection of the maximum amount of information. In this case, it is the sensor using information to slow or increase the rate of change of its parameters relative to (17). That information is wasted (like a form of information heat) and results in less information available for localisation of the target and, hence, a poorer estimate of it location.

Configuration for Bearings-Only Sensors
To illustrate the mathematical machinery developed in the previous sections consider first the configuration metric for two bearings-only sensors. The physical space Ω, where the sensors and the target reside, is chosen to be the square [−10, 10] × [−10, 10] ⊂ R 2 , though the sensors are 'elevated' at some height z i above the plane. Effectively, we assume an uninformative (uniform) prior over the square Ω.
The goal is to estimate a position θ from bearings-only measurements taken by the sensors, as in previous work [12,13]. We assume that measurements are drawn from a Von Mises distribution, where κ is the concentration parameter and I r is the rth modifed Bessel function of the first kind, [34], and λ n = (x n , y n ) is the location of the n-th sensor in Cartesian coordinates. Note that the parameter, κ, being a measure of angular concentration means the location error will increase and the signal-to-noise ratio decrease as the sensors move farther away from the target. For the choice (18), the Fisher metric (4) can be computed easily, and has components where κ = 1 is chosen as our noise parameter in what follows. The target lives in the plane, thus z = 0 but the sensors live in a copy of the plane at height 1, i.e., we take z i = 1. This prevents singularities from emerging within (19).

Target Geodesics
A geodesic, γ(t), starting at p with initial direction v for a manifold with metric g is the solution of the coupled second-order initial value problem for the components γ i (t): where Γ i jk are the Christoffel symbols for the metric g. Figure 3 shows directly integrated solutions to the geodesic Equation (20) a target at (−1, −3) and sensors at (−7, −6) and (0, 1) (raised at a height 1 above Ω). The differing paths correspond to the initial direction vector (cos φ, sin φ) of the target, varying as φ varies from 0 to 2π radians in steps of 0.25 radians.
The solution of the geodesic equation are generated from t = 0 to 5. this means the length of the corresponds to the information given away by the target. The longest line corresponds to the target forming a right-angle with the sensors which is the optimal arrangement for bearings only sensors. The shortest geodesics occur when the target moves on the line joining the two sensors. This usually where a location cannot be determined but this singularity is avoided in this example because the sensors are arranged above the plane inhabited by the target but it is still the place where the location is most difficult to determine.
An alternative way to numerically compute the geodesics connecting two points on the manifold M is using the Fast Marching Method (FMM) [35][36][37]. Since the Fisher-Rao information metric is a Riemannian metric on the Riemannian manifold M, one can show that the geodesic distance map u, the geodesic distance from the initial point p to a point θ, satisfies the Eikonal equation ||∇u(θ)|| g −1 with initial condition u(p) = 0. By using a mesh over the parameter space, the (isotropic or weakly anisotropic) Eikonal Equation (21), can be solved numerically by Fast Marching. The geodesic is then extracted by integrating numerically the ordinary differential equation where η t > 0 is the mesh size. The computational complexity of FMM is O (N log N), where N is the total number of mesh grid points. We perform FMM on a central computational node in this paper. If the computational capability of that node is limited, a parallel version of FMM maybe used instead and we will include it in future work. For Eikonal equations with strong anisotropy, a generalized version of FMM, the Ordered Upwind Method (OUM) [38] is preferred. Compare Figure 3 with Figure 4 which uses a Fast Marching algorithm to calculate the geodesic distance from the same point. Figure 5 shows the speed required along a geodesic travelling in the direction of the vector (1, −1). In the case of these bearing-only sensors, the trade-off over speed is between time-on-target and rate of change of relative angle between the sensors and the target. Travelling slowly means more time for the sensors to integrate measurements but the change in angle is slower, resulting in more accurate position estimates. Conversely, faster movement than the geodesic speed results in larger change in angle measurements but less time for measurements again resulting in more accurate measurements. Only at the geodesic speed is the balance reached and the minimum information criterion achieved.

Configuration Metric Calculations
The coordinates z on S Ω are {x 1 , y 1 , x 2 , y 2 }. The actual positions of the sensor are raised above Ω by a distance z i . The sensor management problem amounts to, given some initial configuration, identifying a choice of {x 1 , y 1 , x 2 , y 2 } for which the determinant of (14) is maximized, and traversing geodesics γ i in S Ω , starting at the initial locations and ending at the D-optimal locations; see Figure 2. We assume that the target location is given by an uninformative prior distribution; that is, P(θ ∈ A) = vol(A)/vol(Ω) for all A ⊂ Ω.
To make the problem tractable, we consider a simple case where one of the sensor trajectories is fixed; that is, we consider a two-dimensional submanifold of M(Γ) parameterized by the coordinates (x 2 , y 2 ) of S 2 only. Figure 6 shows a contour plot of det(G), as a function of (x 2 , y 2 ), for the case where S 1 moves from (0, 1) (yellow dot) to (2, 3) (black dot). The second sensor S 2 begins at the point (−6, −7) (red dot). Figure 6 demonstrates that, provided S 1 moves from (0, 1) to (2,3), det(G) is maximized at the point (−1, −5.5), implying that S 2 moving to (−1, −5.5) is the D-optimal choice. The geodesic path γ 1 beginning at (0, 1) and ending at (2, 3) is shown by the dashed yellow curve; this is the information-maximizing path of S 1 through Ω. Similarly for S 2 , the geodesic path γ 2 beginning at (1, −3) and ending at (−1, −5.5) is shown by the dotted red curve.
��� ��� ��� ��� Figure 6. Contour plot of det(G), for G given by (14), for the case of three sensors, where S 1 is fixed at (−2, −2), and S 2 at (2, 2) (red dot). Brighter shades indicate a greater value of det(G); det(G) is maximum at (−1, −5.5). The third sensor is allowed to move. The geodesic path linking the initial and final positions of S 3 starting at (6, −6) (yellow dot) is shown by the dashed yellow curve, while the dashed red curve shows another geodesic path linking (−6, −7) (red dot) to the D-optimal location (−1, −5.5). The actual positions of the sensors are raised above Ω by a distance z i .  (−1, −3). The actual positions of the sensor are raised above Ω by a distance 1. The differing paths correspond to φ varying through [0, 2π] in steps of 0.25 radians in the target initial direction vector (cos φ, sin φ). the integration of the geodesic equation is carried to t = 5 so the length of each line corresponds to the information obtained about the target. It is clear by that the longer lines correspond to the sensors forming a right-angle relationship to the target which is optimal for bearings-only sensors. This should be compared with Figure 8, which uses a Fast Marching algorithm [39,40] to calculate the geodesic distance from the same point. Figure 9 shows the speed required along a geodesic travelling through each point in the direction of the vector (1,1).

Discussion
We consider the problem of sensor management from an information-geometric standpoint [8,9]. A physical space houses a target, with an unknown position, and a collection of mobile sensors, each of which takes measurements with the aim of gaining information about target location [12][13][14]. The measurement process is parametrized by the relative positions of the sensors. For example, if one considers an array of sensors that take bearings-only measurements to the target (18), the amount of information that can be extracted regarding target location clearly depends on the angles between the sensors. In general, we illustrate that in order to optimize the amount of information the sensors can obtain about the target, the sensors should move to positions which maximize the norm of the volume form ('D-optimality') on a particular manifold imbued with a metric (14) which measures the distance (information content difference) between Fisher matrices [30,31]. We also show that, if the sensors move along geodesics (with respect to (14)) to reach the optimal configuration, the amount of information that they give away to the target is minimized. This paves the way for (future) discussions about game-theoretic scenarios where both the target and the sensors are competitively trying to acquire information about one another from stochastic measurements; see, e.g., [41,42] for a discussion on such games. Differential games along these lines will be addressed in forthcoming work.
It is important to note that while the measurements from sensors may be discrete and subject to sampling limits, the underlying model of target and sensor parameters is continuous. We have introduced two discretisations of our manifold to calculate approximations to our optimal reconfiguration paths and to enable some helpful visualisations. The explicit solution of the geodesic equation Figure 3 uses a numerical differential equation solver on a coordinate patch over the sensor manifold. The Fast Marching visualisation of Figure 4 uses a meshgrid over the same coordinate patch. Future work on optimal control of sensor parameters and game theoretic competitions between the information demands of sensors and targets may require more sophisticated discretisations, which available in the literature [43,44].
We hope that this work may eventually have realistic applications to signal processing problems involving parameter estimation using sensors. We have demonstrated that there is a theoretical way of choosing sensor positions, velocities, and possibly other parameters in an optimal manner so that the maximum amount of useful data can be harvested from a sequence of measurements taken by the sensors. For example, with sensors that take continuous or discrete measurements, this potentially allows one to design a system that minimizes the expected amount of time taken to localize (with some given precision) the position of a target. If the sensors move along paths that are geodesic with respect to (14), then the target, in some sense, learns the least about its trackers. This allows the sensors to prevent either intentional or unintentional evasive manoeuvres; a unique aspect of information-geometric considerations. Ultimately, these ideas may lead to improvements on search or tracking strategies available in the literature, e.g., [45,46]. Though we have only considered simple sensor models in this paper, the machinery can, in principle, be adopted to systems of arbitrary complexity. It would certainly be worth testing the theoretical ideas presented in this paper experimentally using various sensor setups.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Now (X, d) is called a length space and d a length metric if d(x, y) = inf{len(γ) : γ is a curve from x to y }∀x, y ∈ X. (A18) If, for all x, y ∈ X, there is a curve γ that achieves the infimum, the (X, d) is called a geodesic space.
A metric space is proper if every closed and bounded set is compact. We note that any noncontracting map f : X → X is an isometry if X is a proper metric space. The following two results are in [47]: : 0 = t 0 < t 1 < · · · < t N = 1, d(γ(t k ), γ(t k−1 )) < , ∀k , (A19) and now using (A8) we see that, for our case, the length of a curve defined in this way starting from ρ KL (resp. ρ JS ) is just the integral defined in (A15) (resp. (A16)). This proves that ρ KL and ρ JS are indeed length metrics. In fact, we have the following theorems: Theorem A2. The metric ρ KL is the largest length metric ρ such that ρ(θ, θ ) 2 ≤ D KL (θ θ ) for all θ, θ ∈ M.