Limited sensor resources are a bottleneck for most surveillance systems. It is rarely possible to fulfill the requirements of large area coverage and high resolution sensor data at the same time. This article considers a surveillance scenario where an unmanned aerial vehicle (UAV) with a gimballed infrared/vision sensor monitors a certain area. The field-of-view of the camera is very narrow so just a small part of the scene can be surveyed at a given moment. The problem is to keep track of all discovered targets, and simultaneously search for new targets, by controlling the pointing direction of the camera and the motion of the UAV. The motion of the targets (e.g., cars) are constrained by a road network which is assumed to be prior information. The tracking and sensor management modules presented in this article are essential parts of (semi-) autonomous surveillance systems corresponding to the UAV framework presented in [

1]. The goal is to increase sensor system performance by providing autonomous/automatic functionalities that can support a human system operator. The problem considered here is related to a number of different surveillance and security applications, e.g., traffic surveillance, tracking people or vehicles near critical infrastructures, or maintaining a protection zone around a military camp or a vehicle column. See [

2] for a recent survey of unmanned aircraft systems and sensor payloads.

The work is based on Bayesian estimation and search methods,

i.e., the search methods are based on the cumulative probability of detection and the target tracking algorithm is based on a particle filter. The exploitation of contextual information, such as maps and terrain information, is highly desirable for the enhancement of the tracking performance, not only explicitly in the target tracker algorithm itself, but also implicitly by improving the sensor management. In this work, the road constrained target assumption is an important aspect of the problem formulation, and this assumption is used extensively to improve both tracking and planning. In this work a single platform with a single pan/tilt tele-angle camera is considered,

Figure 1.

An alternative and interesting setup is when a second, possibly fixed, wide-angle camera provides points of interest for the tele-angle camera to investigate. Such setup will probably lead to better performance, but the planning problem will still be similar to the one we are considering in this article and therefore we think that our problem is still very relevant.

#### 1.1. Background and Literature Survey

Sensor management aims at managing and coordinating limited sensor and system resources to accomplish specific and dynamic mission objectives [

3,

4]. Algorithms and methods solving realistic sensor management problems are computationally very demanding, since realistic models of the environment, sensors and platforms are very complex due to the nonlinear and stochastic properties of the world. The optimal solution is usually impossible to find, but in practice this is not critical since there is in general a large number of suboptimal solutions that are sufficiently good. However, the problem of finding a good suboptimal solution is still difficult.

A common assumption, utilized also in this work, is that the system can be modeled by a first-order Markov process,

i.e., given all of the past states, the current state depends only on the most recent. A Markov Decision Process (MDP) [

5,

6] is a sequential decision problem defined by a state set, an action set, a Markovian state transition model and an additive reward function. MDP problems can be solved by using Dynamic programming techniques, e.g., the value iteration algorithm [

7,

8]. In an MDP the current system state is always known. The case where the system state variables are not fully observable is called the Partially Observable MDP (POMDP). The POMDP problem can be transformed to an MDP where the state of the MDP is the sufficient statistics, also called belief state. The belief state is updated according to recursive Bayesian estimation methods. Usually the state space, action space and observation space are assumed to be finite in a POMDP problem.

One of the first algorithms for an exact solution to POMDP was given by Sondik [

9]. More efficient exact solvers have been developed, but still only very small problems can be solved optimally. However, different suboptimal methods have also been proposed (see e.g., [

10–

12]). The suboptimal methods usually approximate the belief space or the value function or both. The development of faster computers with more memory is also one reason to the increased interest into POMDP problems recently.

In receding horizon control (RHC) only a finite planning horizon is considered. The extreme case is myopic planning where the next action is based only on the immediate consequence of that action. A related approach is roll-out where an optimal solution scheme is used for a limited time horizon and a base policy is applied beyond that time point. The base policy is suboptimal, but should be easy to compute. He and Chong [

13] solve a sensor management problem by applying a roll-out approach based on a particle filter. Miller

et al. [

14] propose a POMDP approximation based on a Gaussian target representation and the use of nominal state trajectories in the planning. They are applying the method to a UAV guidance problem for multiple target tracking. He

et al. [

15] represent the target belief as a multi-modal Gaussian which is exploited in the planning of a tracking problem with road-constrained targets.

One suboptimal approach is to approximate the original problem with a new problem where some theoretical results in planning and control can be applied that makes the problem simpler to solve. One example in the sensor management context is when multi-target tracking planning problems are treated as a multi-armed bandit (MAB) problem (see [

4] for a survey). A classical MAB problem contains a number of independent processes (called machines or bandits). An operated machine generates a state dependent reward. The planning problem is to select the sequence of operated machines to maximize the expected return. The problem can be solved by dynamic programming, but it can be shown that a reward index policy is optimal. Thus, the Gittins index (named after its inventor) can be computed independently for all machines and then the optimal policy is to operate the machine with largest index. In sensor management this problem is used in multi-target tracking applications, similar to the problem in this article, where each machine is a target and the problem is to decide which target to update. However, one major disadvantage with the MAB formulation is the assumption that the state of all non-operated machines must be frozen. This is an unrealistic assumptions in target tracking where the targets are moving. There are attempts to overcome this by considering restless bandits [

16,

17]. In multi-function radar applications a scheduling algorithm is needed to schedule different search, tracking and engagement functions. Many algorithms are based on the MAB problem formulation, but there are also other approaches (see [

18] and the references therein). In applications with vision sensor and a moving sensor platform the MAB formulation is not suitable since switching between different targets cannot be performed instantaneously. There are MAB variants where a switching penalty is included, but this violates the optimality of the index policy.

Search theory is the study of how to optimally employ limited resources when searching for objects of unknown location [

19,

20]. Search problems can be broadly categorized into one-sided and two-sided search problems. In one-sided search problems the searcher can choose a strategy, but the target cannot—in other words, the target does not react to the search. In two-sided search problems both the searcher and the target can choose strategies, and this is related to game theoretical methods. In this work only one-sided search problems are considered.

The elements of a basic search problem are: (a) a prior probability distribution of the search object location; (b) a detection function relating the search effort and the probability of detecting the object given that the object is in the scanned area; (c) a constrained amount of search effort; and finally (d) an optimization criterion representing the probability of success. There are two common criteria that are used in search strategy optimization. One criterion is the probability of finding the target in a given time interval and the other criterion is the expected time to find the target.

The classical search theory, as developed by Koopman

et al. [

21], is mainly concerned with determining the optimal search effort density for one-sided problems,

i.e., how large fraction of the available time should be spent in each part of the search region given a prior distribution for the target location. Much research has been done based on the early work by Koopman; people have investigated different types of detection functions, different search patterns, more general target prior distributions, moving targets and moving searchers (see e.g., [

22–

24]). There are several recent papers considering a (multi) UAV search for targets using some Bayesian approach, see for instance Bourgault

et al. [

25] and Furukawa

et al. [

26]. A common search approach is to represent the target density by a discrete fixed probability grid, and the goal is to maximize the number of detected targets and the search performance is represented by a cumulative probability of detection. In recent years particle mixtures have also been used to represent the target probability density in search and rescue applications (see e.g., [

27–

30]). In this work two different filters will be used, similar to the approach in [

31]. A grid based filter is used for hypothesized but undiscovered targets and once a target is detected it is represented by a particle filter [

32].

Classical

multi-target tracking consists of three sub-problems; detection, association, and estimation [

33,

34]. There are different detection approaches; boosting is one popular and powerful method (see [

1] and the references therein). An alternative for dynamic targets is optical flow techniques [

35]. Target tracking with road network information requires methodologies which can keep the inherent multi-modality of the underlying probability densities. The first attempts [

36–

38] used the jump Markov (non)linear systems in combination with the variable structure interacting multiple model (VS-IMM) algorithm [

39,

40]. Important alternatives to IMM based methods appear in [

41,

42] which propose variable structure multiple model particle filters (VS-MMPF) where road constraints are handled using the concept of directional process noise. In [

43] the roads are 3D curves represented by linear segments and the road network is represented as a graph with roads and intersections as the edges and nodes, respectively. The position and velocity along a single road are modeled by a standard linear Gauss–Markov model. Since the particle filters can handle nonlinear and non-Gaussian models, the user has much more freedom than in Kalman filter and IMM modeling. In this work the road target tracking approach in [

44] is used, but the association problem is ignored by assuming good discrimination among the targets.