The Role of Visibility in Pursuit / Evasion Games

The cops-and-robber (CR) game has been used in mobile robotics as a discretized model (played on a graph G) of pursuit/evasion problems. The"classic"CR version is a perfect information game: the cops' (pursuer's) location is always known to the robber (evader) and vice versa. Many variants of the classic game can be defined: the robber can be invisible and also the robber can be either adversarial (tries to avoid capture) or drunk (performs a random walk). Furthermore, the cops and robber can reside in either nodes or edges of G. Several of these variants are relevant as models or robotic pursuit / evasion. In this paper, we first define carefully several of the variants mentioned above and related quantities such as the cop number and the capture time. Then we introduce and study the cost of visibility (COV), a quantitative measure of the increase in difficulty (from the cops' point of view) when the robber is invisible. In addition to our theoretical results, we present algorithms which can be used to compute capture times and COV of graphs which are analytically intractable. Finally, we present the results of applying these algorithms to the numerical computation of COV.


Introduction
Pursuit / evasion (PE) and related problems (search, tracking, surveillance) have been the subject of extensive research in the last fifty years and much of this research is connected to mobile robotics [7]. When the environment is represented by a graph 1 , the original PE problem is reduced to a graph game played between the pursuers and the evader.
In the current paper, inspired by Isler and Karnad's recent work [21], we study the role of information in cops-and-robber (CR) games, an important version of graph-based PE. By "information" we mean specifically the players' location. For example, we expect that when the cops know the robber's location they can do better than when the robber is "invisible". Our goal is to make precise the term "better".
Reviews of the graph theoretic CR literature appear in [3,5,12]. In the "classical" CR variant [30] it is assumed that the cops always know the robber's location and vice versa. The "invisible" variant, in which the cops cannot see the robber (but the robber always sees the cops) has received less attention in the graph theoretic literature; among the few papers which treat this case we mention [20,21,22,9] and also [1] in which both cops and robber are invisible.
Both the visible and invisible CR variants are natural models for discretized robotic PE problems; the connection has been noted and exploited relatively recently [20,21,38]. If it is further assumed that the robber is not actively trying to avoid capture (the case of drunk robber) we obtain a one-player graph game; this model has been used quite often in mobile robotics [13,16,17,26,35] and especially (when assuming random robber movement) in publications such as [19,25,32,36,37], which utilize partially observable Markov decision processes (POMDP, [15,27,29]). For a more general overview of pursuit/evasion and search problems in robotics, the reader is referred to [7]; some of the works cited in this paper provide a useful background to the current paper.
This paper is structured as follows. In Section 2 we present preliminary material, notation and the definition of the "classical" CR game; we also introduce several node and edge CR variants. In Section 3 we define rigorously the cop number and capture time for the classical CR game and the previously introduced CR variants. In Section 4 we study the cost of visibility (COV ). In Section 5 we present algorithms which compute capture time and optimal strategies for several CR variants. In Section 6 we further study COV using computational experiments. Finally, in Section 7 we summarize and present our conclusions. x ∈ A, x / ∈ B}; |A| denotes the cardinality of A (i.e., the number of its elements).

2.
A graph G = (V, E) consists of a node set V and an edge set E, where every e ∈ E has the form e = {x, y} ⊆ V . In other words, we are concerned with finite, undirected, simple graphs; in addition we will always assume that G is connected and that G contains n nodes: |V | = n. Furthermore, we will assume, without loss of generality, that the node set is V = {1, 2, ..., n}. We let . We also define V 2 D ⊆ V 2 by x ∈ V } (it is the set of "diagonal" node pairs). 6. We will write f (n) = o (g (n)) if and only if lim n→∞ f (n) g(n) = 0. Note that in this asymptotic notation n denotes the parameter with respect to which asymptotics are considered. So in later sections we will write o (n), o (M ) etc.

The CR Game Family
The "classical" CR game can be described as follows. Player C controls K cops (with K ≥ 1) and player R controls a single robber. Cops and robber are moved along the edges of a graph G = (V, E) in discrete time steps t ∈ N 0 . At time t, the robber's location is Y t ∈ V and the cops' locations are X t = (X 1 t , X 2 t , . . . , X K t ) ∈ V K (for t ∈ N 0 and k ∈ [K]). The game is played in turns; in the 0-th turn first C places the cops on nodes of the graph and then R places the robber; in the t-th turn, for t > 0, first C moves the cops to X t and then R moves the robber to Y t . Two types of moves are allowed: (a) sliding along a single edge and (b) staying in place; in other words, for all t and k, either The cops win if they capture the robber, i.e., if there exist t ∈ N 0 and k ∈ [K] such that Y t = X k t ; the robber wins if for all t ∈ N 0 and k ∈ In what follows we will describe these eventualities by the following "shorthand notation": Y t ∈ X t and Y t / ∈ X t (i.e., in this notation we consider X t as a set of cop positions).
In the classical game both C and R are adversarial : C plays to effect capture and R plays to avoid it. But there also exist "drunk robber " versions, in which the robber simply performs a random walk on G such that, for all ∀u, v ∈ V we have In this case we can say that no R player is present (or, following a common formulation, we can say that the R player is "Nature").
If an R player exists, the cops' locations are always known to him; on the other hand, the robber can be either visible (his location is known to C) or invisible (his location is unknown). Hence we have four different CR variants, as detailed in Table 1 In all of the above CR variants both cops and robber move from node to node. This is a good model for entities (e.g., robots) which move from room to room in an indoor environment. There also exist cases (for example moving in a maze or a road network) where it makes more sense to assume that both cops and robber move from edge to edge. We will call the classical version of the edge CR game edge av-CR; it has attracted attention only recently [10]. Edge ai-CR, dv-CR and di-CR variants are also possible, in analogy to the node versions listed in Table 1. Each of these cases can be reduced to the corresponding node variant, with the edge game taking place on the line graph L (G) of G.

Cop Number and Capture Time
Two graph parameters which can be obtained from the av-CR game are the cop number and the capture time. In this section we will define these quantities in game theoretic terms 2 and also consider their extensions to other CR variants. Before examining each of these CR variants in detail, let us mention a particular modification which we will apply to all of them. Namely, we assume that (every variant of) the CR game is played for an infinite number of rounds. This is obviously the case if the robber is never captured; but we also assume that, in case the robber is captured at some time t * , the game continues for t ∈ {t * + 1, t * + 2, . . .} with the following restriction: for all t ≥ t * , we have Y t = X k * t (where k * is the number of cop who effected the capture) 3 .

The Node av-CR Game
We will define cop number and capture time in game theoretic terms. To this end we must first define histories and strategies.
A particular instance of the CR game can be fully described by the sequence of cops and robber locations; these locations are fully determined by the C and R moves. So, if we let x t ∈ V K (resp. y t ∈ V ) denote the nodes into which C (resp. R) places the cops (resp. the robber) at time t, then a history is a sequence x 0 y 0 x 1 y 1 . . . . Such a sequence can have finite or infinite length; we denote the set of all finite length histories by H (K) * ; note that there exists an infinite number of finite length sequences. By convention H (K) * also includes the zero-length or null history, which is the empty sequence 4 , denoted by λ. Finally, we denote the set of all infinite length histories by H (K) ∞ . Since both cops and robber are visible and the players move sequentially, av-CR is a game of perfect information; in such a game C loses nothing by limiting himself to pure (i.e., deterministic) strategies [24]. A pure cop strategy is a function s C : H (K) * → V K ; a pure robber strategy is a function s R : H (K) * → V . In both cases the idea is that, given a finite length history, the strategy produces the next cop or robber move 5 ; for example, when the robber strategy s R receives the input x 0 , it will produce the output y 0 = s R (x 0 ); when it receives x 0 y 0 x 1 , it will produce y 1 = s R (x 0 y 0 x 1 ) and so on. We will denote the set of all legal cop strategies by S (K) C and the set of all legal robber strategies by S (K) R ; a strategy is "legal" if it only provides moves which respect the CR game rules. The set S ) is the set of memoryless legal cop (resp. robber) strategies, i.e., strategies which only depend only on the current cops and robber positions; we will denote the memoryless strategies by Greek letters, e.g., σ C , σ R etc. In other words It seems intuitively obvious that both C and R lose nothing by playing with memoryless strategies (i.e., computing their next moves based on the current position of the game, not on its entire history). This is true but requires a proof. One approach to this proof is furnished in [6,14]. But we will present another proof by recognizing that the CR game belongs to the extensively researched family of reachability games [4,28].
A reachability game is played by two players (Player 0 and Player 1) on a digraph G = V , E ; each node v ∈ V is a position and each edge is a move; i.e., the game moves from node to node (position) along the edges of the digraph. The game is described by the tuple , V i is the set of positions (nodes) from which the i-th Player makes the next move; the game terminates with a win for Player 0 if and only if a move takes place into a node v ∈ F (the target set of Player 0); if this never happens, Player 1 wins 6 . The following is well known [4,28].
Theorem 3.1 Let V 0 , V 1 , E, F be a reachability game on the digraph D = V , E . Then V can be partitioned into two sets W 0 and W 1 such that (for i ∈ {0, 1}) player i has a memoryless strategy σ i which is winning whenever the game starts in u ∈ W i .
We can convert the av-CR game with K cops to an equivalent reachability game which is played on the CR game digraph. In this digraph every node corresponds to a position of the original CR game; a (directed) edge from node u to node v indicates that it is possible to get from position u to position v in a single move. The CR game digraph has three types of nodes.
1. Nodes of the form u = (x, y, p) correspond to positions (in the original CR game) with the cops located at x ∈ V K , the robber at y ∈ V and player p ∈ {C, R} being next to move.
2. There is single node u = (λ, λ, C) which corresponds to the starting position of the game: neither the cops nor the robber have been placed on G; it is C's turn to move (recall that λ denotes the empty sequence).
3. Finally, there exist n nodes of the form u = (x, λ, R): the cops have just been placed in the graph (at positions x ∈ V K ) but the robber has not been placed yet; it is R's turn to move.
Let us now define and let E (K) consist of all pairs (u, v) where u, v ∈ V (K) and the move from u to v is legal.
Finally, we recognize that C's target set is i.e., the set of all positions in which the robber is in the same node as at least one cop.
With the above definitions, we have mapped the classical CR game (played with K cops on the graph G) to the reachability game V , a set with the following property: whenever the reachability game starts at some u ∈ W (K) i , then Player i has a winning strategy (it may be the case, for specific G and K that either of W is empty). Recall that in our formulation of CR as a reachability game, Player 0 is C. In reachability terms, the statement "C has a winning strategy in the classical CR game" translates to "(λ, λ, C) ∈ W (K) 0 " and, for a given graph G, the validity of this statement will in general depend on K . It is clear that W It is also also clear that because, if C has |V | cops, he can place one in every u ∈ V and win immediately 7 . Based on (2) and (3) we can define the cop number of G to be the minimum number of cops that guarantee capture; more precisely we have the following definition (which is equivalent to the "classical" definition of cop number [2]).
While a cop winning strategy s C guarantees that the token will go into (and remain in) we still do not know how long it will take for this to happen. However, it is easy to prove that, if K ≥ c(G) and C uses a memoryless winning strategy, then no game position will be repeated until capture takes place. Hence the following holds. 7 In fact, for K = |V |, we have W , because from every position (x, y, p), C can move the cops so that one cop resides in each u ∈ V , which guarantees immediate capture. Theorem 3.3 For every G, let K ≥ c (G) and consider the CR game played on G with K cops. There exists a a memoryless cop winning strategy σ C and a number T (K; G) < ∞ such that, for every robber strategy s R , C wins in no more than T (K; G) rounds.
Let us now turn from winning to time optimal strategies. To define these, we first define the capture time, which will serve as the CR payoff function.
Definition 3.4 Given a graph G, some K ∈ N and strategies s C ∈ S (4) in case capture never takes place, we let We will assume that R's payoff is T (K) (s C , s R |G) and C's payoff is −T (K) (s C , s R |G) (hence av-CR is a two-person zero-sum game). Note that capture time (i) obviously depends on K and (ii) for a fixed K is fully determined by the s C and s R strategies. Now, following standard game theoretic practice, we define optimal strategies. Definition 3.5 For every graph G and K ∈ N, the strategies s The value of the av-CR game played with K cops is the common value of the two sides of (5) and we denote it T (K) s We emphasize that the validity of (5) is not known a priori. C (resp. R) can guarantee that he loses no more than inf s C ∈S (K) But, since av-CR is an infinite game (i.e., depending on s C and s R , it can last an infinite number of turns) it is not clear that equality holds in (6) and, even when it does, the existence of optimal strategies s which achieve the value is not guaranteed. In fact it can be proved that, for K ≥ c (G), av-CR has both a value and optimal strategies. The details of this proof will be reported elsewhere, but the gist of the argument is the following. Since av-CR is played with K ≥ c (G) cops, by Theorem 3.3, C has a memoryless strategy which guarantees the game will last no more than T (K; G) turns. Hence av-CR with K ≥ c (G) essentially is a finite zero-sum two-player game; it is well known [31] that every such game has a value and optimal memoryless strategies. In short, we have the following. Theorem 3.6 Given any graph G and any K ≥ c (G), for the av-CR game there exists a pair σ of memoryless time optimal strategies such that Hence we can define the capture time of a graph to be the value of av-CR when played on G with K = c (G) cops.

The Node dv-CR Game
In this game the robber is visible and performs a random walk on G (drunk robber) as indicated by (1). In the absence of cops, Y t is a Markov chain on V , with transition probability matrix P , where for every u, v ∈ {1, 2, ..., |V |} we have In the presence of one or more cops, {Y t } ∞ t=0 is a Markov decision process (MDP) [33] with state space V ∪ {n + 1} (where n + 1 is the capture state) and transition probability matrix P (X t ) (obtained from P as shown in [23]); in other words, X t is the control variable, selected by C.
Since no robber strategy is involved, the capture time on G only depends on the (K-cops strategy) s C : namely: which can also be written as where 1 (Y t / ∈ X t ) equals 1 if Y t does not belong to X t (taken as a set of cop positions) and 0 otherwise. Since the robber performs a random walk on G, it follows that T (K) (s C |G) is a random variable, and C wants to minimize its expected value: The minimization of (9) is a typical undiscounted, infinite horizon MDP problem. Using standard MDP results [33] we see that (i) C loses nothing by determining X 0 , X 1 , . . . through a memoryless strategy σ C (x, y) and (ii) for every K ≥ 1, E T (K) (σ C |G) is well defined. Furthermore, for every K ∈ N there exists an optimal strategy σ (K) C which minimizes E T (K) (σ C |G) ; hence we have the following. Theorem 3.8 Given any graph G and K ∈ N, for the dv-CR game played on G with K cops there exists a memoryless strategy σ Note that, even though a single cop suffices to capture the drunk robber on any G, we have chosen to define dct (G) to be the capture time for K = c (G) cops; we have done this to make (in Section 4) an equitable comparison between ct (G) and dct (G).

The Node ai-CR Game
This is not a perfect information game, since C cannot see R's moves. Hence C and R must use mixed strategies s C , s R . A mixed strategy s C (resp. s R ) specifies, for every t, a conditional probability Pr and these induce a probability measure which in turn determines R's expected gain (and C's expected loss), namely E T (K) s Similarly to av-CR, C (resp. R) can guarantee an expected payoff no greater than v (K) (resp. no less than v (K) ). If v (K) = v (K) , we denote the common value by v (K) and call it the value of the ai-CR game (played on G, with K cops). A pair of strategies s In [22] we have studied the ai-CR game and proved that it does indeed have a value and optimal strategies. We give a summary of the relevant argument; proofs can be found in [22].
First, invisibility does not increase the cop number. In other words, there is a cop strategy (involving c (G) cops) which guarantees bounded expected capture time for every robber strategy s R . More precisely, we have proved the following.
Now consider the "m-truncated" ai-CR game which is played exactly as the "regular" ai-CR but lasts at most m turns. Strategies s R ∈ S (K) R and s C ∈ S (K) C can be used in the mtruncated game: C and R use them only until the m-th turn. Let R receive one payoff unit for every turn in which the robber is not captured; denote the payoff of the m-truncated game (when strategies s C , s R are used) by T The expected payoff of the m-truncated game is E T . Because it is a finite, two-person, zero-sum game, the m-truncated game has a value and optimal strategies. Namely, the value is and there exist optimal strategies s In [22] we use the truncated games to prove that the "regular" ai-CR game has a value, an optimal C strategy and ε-optimal R strategies. More precisely, we prove the following.
Theorem 3.11 Given any graph G and K ≥ c (G), the ai-CR game played on G with K cops has a value v (K) which satisfies Furthermore, there exists a strategy s and for every ε > 0 there exists an m ε and a strategy s Having established the existence of v (K) we have the following.

The Node di-CR Game
In this game Y t is unobservable and drunk; call this the "regular" di-CR game and also introduce the m-truncated di-CR game. Both are one-player games or, equivalently, Y t is a partially observable MDP (POMDP) [33]. The target function is which is exactly the same as (9) but now Y t is unobservable. (13) can be approximated by The expected values in (13)-(14) are well defined for every s C . C must select a strategy s C ∈ S . This is a typical infinite horizon, undiscounted POMDP problem [33] for which the following holds.
Theorem 3.13 Given any graph G and K ∈ N, for the di-CR game played on G with K cops there exists a strategy s Hence we can introduce the following.
Definition 3.14 The drunk invisible capture time of G is with K = c (G). In general, all of these "edge CR parameters" will differ from the corresponding "node CR parameters".

Cost of Visibility in the Node CR Games
As already remarked, we expect that ai-CR is more difficult (from C's point of view) than av-CR (the same holds for the drunk counterparts of this game). We quantify this statement by introducing the cost of visibility (COV ).   Proof. (i) Computing ct (S N,1 ). In av-CR, for every N ∈ N we have ct (S N,1 ) = 1: the cop starts at X 0 = 0, the robber starts at some Y 0 = u = 0 and, at t = 1, he is captured by the cop moving into u; i.e., ct (S N,1 ) ≤ 1; on the other hand, since there are at least two vertices (N ≥ 1), clearly ct (S N,1 ) ≥ 1.
(ii) Computing ct i (S N,1 ). Let us now show that in ai-CR we have ct i (S N,1 ) = N . C places the cop at X 0 = 0 and R places the robber at some Y 0 = u = 0. We will obtain ct i (S N,1 ) by bounding it from above and below. For an upper bound, consider the following C strategy. Since C does not know the robber's location, he must check the leaf nodes one by one. So at every odd t he moves the cop into some u ∈ {1, 2, . . . , N } and at every even t he returns to 0. Note that R cannot change the robber's original position; in order to do this, the robber must pass through 0 but then he will be captured by the cop (who either is already in 0 or will be moved into it just after the robber's move). Hence C can choose the nodes he will check on odd turns with uniform probability and without repetitions. Equivalently, we can assume that the order in which nodes are chosen by C is selected uniformly at random from the set of all permutations; further, we assume that R (who does not know this order) starts at some Y 0 = u ∈ {1, . . . , N }. Then we have For a lower bound, consider the following R strategy. The robber is initially placed at a random leaf that is different than the one selected by C (if the cop did not start at the center). Knowing this, the best C strategy is to check (in any order) all leaves without repetition. If the cop starts at the center, we get exactly the same sum as for the upper bound. Otherwise, we have where the asymptotics is with respect to M ; N is considered a fixed constant.
Proof. (i) Computing dct (S N,M ). We will first show that, for any N ∈ N, we have dct (S N,M ) = (1 + o (1)) M 2 (recall that the parameter N is a fixed constant whereas M → ∞.) Suppose that the cop starts on the i-th ray, at distance (1 + o(1))cM from the center (for some constant c ∈ [0, 1]). The robber starts at a random vertex. It follows that for any j such that 1 ≤ j ≤ N , the robber starts on the j-th ray with probability (1 + o(1))/N . It is a straightforward application of Chernoff bounds 8 to show that with probability 1 + o(1) the robber will not move by more than o(M ) in the next O(M N ) = O(M ) steps, which suffice to finish the game. Hence, the expected capture time is easy to calculate.
• With probability (1 − c + o(1))/N , the robber starts on the same ray as the cop but farther away from the center. Conditioning on this event, the expected capture time is • With probability (c + o(1))/N , the robber starts on the same ray as the cop but closer to the center. Conditioning on this event, the expected capture time is M (c + o(1))/2.
It follows that the expected capture time is which is maximized for c = 0, giving dct (S N,M ) = (1 + o (1)) M 2 . (ii) Computing dct i (S N,M ). The initial placement for the robber is the same as in the visible variant, that is, the uniform distribution is used. However, since the robber is now invisible, C has to check all rays. As before, by Chernoff bounds, with probability at least 1 − e −cM 1/3 (for some constant c > 0) during O(M ) steps the robber is always within distance O(M 2/3 ) from its initial position. If the robber starts at distance ω(M 2/3 ) from the center, he will thus with probability at least 1 − e −cM 1/3 not change his ray during O(M ) steps. Otherwise, he might change from one ray to the other with bigger probability, but note that this happens only with the probability of the robber starting at distance O(M 2/3 ) from the center, and thus with probability at most O(M −1/3 ). Keeping these remarks in mind, let us examine "reasonable" C strategies. It turns out there exist three such. (ii.1) Suppose C starts at the end of one ray (chosen arbitrarily), goes to the center, and then successively checks the remaining rays without repetition, with probability at least 1−O(M −1/3 ), the robber will be caught. If the robber does not switch rays (and is therefore caught), the capture time is calculated as follows: • With probability (1 + o(1))/N , the robber starts on the same ray as the cop. Conditioning on this event, the expected capture time is (1 + o(1))M/2.
Otherwise, if the robber is not caught, C just randomly checks rays: starting from the center, C chooses a random ray, goes until the end of the ray, returns to the center, and continues like this, until the robber is caught. The expected capture time in this case is Since this happens with probability O(M −1/3 ), the contribution of the case where the robber switches rays is o(M ), and therefore for this strategy of C, the expected capture time is (ii.2) Now suppose C starts at the center of the ray, rather than the end, and checks all rays from there. By the same arguments as before, the capture time is which is worse than in the case when starting at the end of a ray.
(ii.3) Similarly, suppose the cop starts at distance cM from the center, for some c ∈ [0, 1]. If he first goes to the center of the ray, and then checks all rays (suppose the one he came from is the last to be checked), then the capture time is which is minimized for c = 1. And if C goes first to the end of the ray, and then to the center, the capture time is which for N ≥ 2 is also minimized for c = 1 (in fact, for N = 2 the numbers are equal). In short, the smallest capture time is achieved when C starts at the end of some ray and therefore completing the proof.

Cost of Visibility in the Edge CR Games
The cost of visibility in the edge CR games is defined analogously to that of node games. Clearly, for every G we have H a (G) ≥ 1 and H d (G) ≥ 1. The following theorems show that in fact both H a (G) and H d (G) can become arbitrarily large. To prove these theorems we will use the previously introduced star graph S N,1 and its line graph which is the clique K N . These graphs are illustrated in Figure 2 for N = 6.
ct(K N ) and, since N ≥ 2, clearly ct(K N ) = 1. Let us now compute ct i (K N ).
For an upper bound on ct i (K N ), C might just move to a random vertex. If the robber stays still or if he moves to a vertex different from the one occupied by C, he will be caught in the next step with probability 1/(N − 1), and thus an upper bound on the capture time is N − 1.
For a lower bound, suppose that the robber always moves to a randomly chosen vertex, different from the one occupied by C, and including the one occupied by him now (that is, with probability 1/(N − 1) he stands still, and after his turn, he is with probability 1/(N − 1) at each vertex different from the vertex occupied by C. Hence C is forced to move, and since he has no idea where to go, the best strategy is also to move randomly, and the robber will be caught with probability 1/(N − 1), yielding a lower bound on the capture time of N − 1. Therefore Proof. This is quite similar to the adversarial case. We have H d (S N,1 ) = dct i (S N,1 ) dct(S N,1 ) = dct i (K N ) dct(K N ) . Clearly we have dct(K N ) = 1 − 1/N (with probability 1/N the robber selects the same vertex to start with as the cop and is caught before the game actually starts; otherwise is caught in the first round).
For dct i (K N ), it is clear that the strategy of constantly moving is best for the cop, as in this case there are two chances to catch the robber (either by moving towards him, or by afterwards the robber moving onto the cop). It does not matter where he moves to as long as he keeps moving, and we may thus assume that he starts at some vertex v and moves to some other vertex w in the first round, then comes back to v and oscillates like that until the end of the game. When the cop moves to another vertex, the probability that the robber is there is 1/ (N − 1). If he is still not caught, the robber moves to a random place, thereby selecting the vertex occupied by the cop with probability 1/ (N − 1). Hence, the probability to catch the robber in one step is Thus, this time the capture time is a geometric random variable with probability of success equal to 2N −3 which can become arbitrarily large by appropriate choice of N .

Algorithms for COV Computation
For graphs of relatively simple structure (e.g., paths, cycles, full trees, grids) capture times and optimal strategies can be found by analytical arguments [22,23]. For more complicated graphs, an algorithmic solution becomes necessary. In this section we present algorithms for the computation of capture time in the previously introduced node CR variants. The same algorithms can be applied to the edge variants by replacing G with L (G).

Algorithm for Adversarial Robber
The av-CR capture time ct(G) can be computed in polynomial time. In fact, stronger results have been presented by Hahn and MacGillivray; in [14] they present an algorithm which, given K, computes for every (x, y) ∈ V 2 the following: 1. C (x, y), the optimal game duration when the cop/robber configuration is (x, y) and it is C's turn to play; 2. R (x, y), the optimal game duration when the cop/robber configuration is (x, y) and it is R's turn to play. 9 The av-CR capture time can be computed by ct(G) = min x∈V max y∈V C (x, y); the optimal search strategies σ C , σ R can also be easily obtained from the optimality equations, as will be seen a little later. We have presented in [23] an implementation of Hahn and MacGillivray's algorithm, which we call CAAR (Cops Against Adversarial Robber). Below we present the algorithm for the case of a single cop (the generalization for more than one cop is straightforward).
The optimal memoryless strategies σ R (x, y) can be computed for every position (x, y) by letting σ R (x, y) ) be a node x ∈ N [x] (resp. y ∈ N [y]) which achieves the minimum in (15) (resp. maximum in (16)). The capture time ct(G) is computed from ct (G) = min x∈V max y∈V C (x, y) .

Algorithm for Drunk Robber
For any given K, value iteration can be used to determine both dct (G, K) and the optimal strategy σ (K) C (x, y); one implementation is our CADR (Cops Against Drunk Robber) algorithm [23] which is a typical value-iteration [33] MDP algorithm; alternatively, CADR can be seen as an extension of the CAAR idea to the dv-CR. Below we present the algorithm for the case of a single cop (the generalization for more than one cops is straightforward).
The Cops Against Drunk Robber (CADR) Algorithm EndFor 12 If max (x,y)∈V 2 C (i) (x, y) − C (i−1) (x, y) < ε 13 Break 14 EndIf The algorithm operates as follows (again we use C (x, y) to denote the optimal expected game duration when the game position is (x, y)). In lines 01-06 C (0) (x, y) is initialized to ∞, except for "diagonal"positions (x, y) ∈ V 2 D . In the main loop (lines 08-16) C (i) (x, y) is computed (line 10) by letting the cop move to the position which achieves the smallest expected capture time (P ((x , y) → (x , y )) in line 10 indicates the transition probability from(x , y) to (x , y )). This process is repeated until the maximum change C (i) (x, y) − C (i−1) (x, y) is smaller than the termination criterion ε, at which point the algorithm exits the loop and terminates. This is a typical value iteration MDP algorithm [33]; the convergence of such algorithms has been studied by several authors, in various degrees of generality [8,11,18]. A simple yet strong result, derived in [11], uses the concept of proper strategy: a strategy is called proper if it yields finite expected capture time. It is proved in [11] that, if a proper strategy exists for graph G, then CADR-like algorithms converge. In the case of dv-CR we know that C has a proper strategy: it is the random walking strategy s (K) C mentioned in Theorem 3.10. Hence CADR converges and in the limit, C = lim i→∞ C (i) satisfies the optimality equations The optimal memoryless strategy σ

Algorithms for Adversarial Robber
We have not been able to find an efficient algorithm for solving the ai-CR game. Several algorithms for imperfect information stochastic games could be used to this end but we have found that they are practical only for very small graphs.

Algorithm for Drunk Robber
In the case of the drunk invisible robber we are also using a game tree search algorithm with pruning, for which some analytical justification can be provided. We call this the Pruned Cop Search (PCS) algorithm. Before presenting the algorithm we will introduce some notation and then prove a simple fact about expected capture time. We limit ourselves to the single cop case, since the extension to more cops is straightforward.
We let x = x 0 x 1 x 2 . . . be an infinite history of cop moves. Letting t being the current time step, the probability vector p (t) contains the probabilities of the robber being in node v ∈ V or in the capture state n + 1; more specifically: p (t) = [p 1 (t) , . . . , p v (t) , . . . , p n (t) , p n+1 (t)] and p v (t) = Pr (y t = v|x 0 x 1 . . . x t ). Hence p (t) depends (as expected) on the finite cop history x 0 x 1 . . . x t . The expected capture time is denoted by C (x) = E(T |x); the conditioning is on the infinite cop history. The PCS algorithm works because E (T |x) can be approximated from a finite part of x, as explained below. We have Pr (T > t|x) . (18) x in the conditioning is the infinite history x = x 0 x 1 x 2 . . . . However, for every t we have Let us define where p n+1 (τ ) is the probability that the robber is in the capture state n + 1 at time τ (the dependence on x 0 x 1 . . . x τ is suppressed, for simplicity of notation). Then for all t we have Update (19) can be computed using only the previous cost C (t−1) (x 0 x 1 . . . x τ −1 ) and the (previously computed) probability vector p (t). While C (t) (x 0 . . . x t ) ≤ C (x), we hope that (at least for the "good" histories) we have This actually works well in practice. The PCS algorithm is given below in pseudocode. We have introduced a structure S with fields S.x, S.p, S.C = C (S.x). Also we denote concatenation by the | symbol, i.e., For All S ∈ S 08 x = S.x, p = S.p, C = S.C 09 If |C best − C old best | < ε 20 Break 21 Else C old best = C best 23 t ← t + 1 24 EndIf 25 EndWhile Output: x best , C best = C (x best ).
The PCS algorithm operates as follows. At initialization (lines 01-04), we create a single S structure (with S.x being the initial cop position, S.p the initial, uniform robber probability and S.C = 0) which we store in the set S. Then we enter the main loop (lines 05-25) where we pick each available cop sequence x of length t (line 08). Then, in lines 09-15 we compute, for all legal extensions x = x|v (where v ∈ N [x t ]) of length t + 1 (line 10), the corresponding p (line 11) and C (by the subroutine Cost at line 12). We store these quantities in S which is placed in the temporary storage set S (lines [13][14]. After exhausting all possible extensions of length t + 1, we prune the temporary set S, retaining only the J max best cop sequences (this is done in line 17 by the subroutine Prune which computes "best" in terms of smallest C (x)). Finally, the subroutine Best in line 18 computes the overall smallest expected capture time C best = C (x best ). The procedure is repeated until the termination criterion |C best − C old best | < ε is satisfied. As explained above, the criterion is expected to be always eventually satisfied because of (20).

Experimental Estimation of The Cost of Visibility
We now present numerical computations of the drunk cost of visibility for graphs which are not amenable to analytical computation 11 . In Section 6.1 we deal with node games and in Section 6.2 with edge games.

Experiments with Node Games
Since H d (G) = dct i (G) dct(G) , we use the CADR algorithm to compute dct (G) and the PCS algorithm to compute dct i (G). We use graphs G obtained from indoor environments, which we represent by their floorplans. In Fig. 3 we present a floorplan and its graph representation. The graph is obtained by decomposing the floorplan into convex cells, assigning each cell to a node and connecting nodes by edges whenever the corresponding cells are connected by an open space.
We have written a script which, given some parameters, generates random floorplans and their graphs. Every floorplan consists of a rectangle divided into orthogonal "rooms". If each internal room were connected to its four nearest neighbors we would get an M × N grid G . However, we randomly generate a spanning tree G T of G and initially introduce doors only between rooms which are connected in G T . Our final graph G is obtained from G T by iterating over all missing edges and adding each one with probability p 0 ∈ [0, 1]. Hence each floorplan is characterized by three parameters: M , N and p 0 . We use the following pairs of (M, N ) values: (1,30), (2,15), (3,10), (4,7), (5,6). Four of these pairs give a total of 30 nodes and the pair (M = 4, N = 7) gives n = 28 nodes; as M/N increases, we progress from a path to a nearly square grid. For each (M, N ) pair we use five p 0 values: 0.00, 0.25, 0.50, 0.75, 1.00; note the progression from a tree (p 0 = 0.00) to a full grid (p 0 = 1.00). For each triple (M, N, p 0 ) we generate 50 floorplans, obtain their graphs and for each graph G we compute dct(G) using CADR, dct i (G) using PCS and H d (G) = dct i (G) dct(G) ; finally we average H d (G) over the 100 graphs. In Fig. 4 we plot dct(G) as a function of the probability p 0 ; each plotted curve corresponds to an (M, N ) pair. Similarly, in Fig. 5 we plot dct i (G) and in Fig. 6 we plot H d (G).   that L(G) has more nodes and edges than G, hence the loss of visibility makes the edge game significantly harder than the node game. There is one exception to the above remarks, namely the case (M, N ) = (1,30); in this case both G and L(G) are paths and H d (G) is essentially equal to H d (G) (as can be seen by comparing Figures 6 and 10).

Conclusion
In this paper we have studied two versions of the cops and robber game: the one is played on the nodes of a graph and the other played on the edges. For each version, we studied four variants, obtained by changing the visibility and adversariality assumptions regarding the robber; hence we have a total of eight CR games. For each of these we have defined rigorously the corresponding optimal capture time, using game theoretic and probabilistic tools.
Then, for the node games we have introduced the adversarial cost of visibility H (G) = ct i (G) ct(G) and the drunk cost of visibility H d (G) = dct i (G) dct(G) . These ratios quantify the increase in difficulty of the CR game when the cop is no longer aware of the robber's position (this situation occurs often in mobile robotics).
We have defined analogous quantities (H (G) = ct i (G) ct(G) , H d (G) = dct i (G) dct(G) ) for the edge CR games.
We have studied analytically H (G) and H d (G) and have established that both can get arbitrarily large. We have established similar results for H (G) and H d (G). In addition, we have studied H d (G) and H d (G) by numerical experiments which support both the game theoretic results of the current paper and the analytical computations of capture times presented in [23,22].