Opponent-Aware Planning with Admissible Privacy Preserving for UGV Security Patrol under Contested Environment

: Unmanned ground vehicles (UGVs) have been widely used in security patrol. The existence of two potential opponents, the malicious teammate (cooperative) and the hostile observer (adversarial), highlights the importance of privacy-preserving planning under contested environments. In a cooperative setting, the disclosure of private information can be restricted to the malicious teammates. In adversarial setting, obfuscation can be added to control the observability of the adversarial observer. In this paper, we attempt to generate opponent-aware privacy-preserving plans, mainly focusing on two questions: what is opponent-aware privacy-preserving planning, and, how can we generate opponent-aware privacy-preserving plans? We ﬁrst deﬁne the opponent-aware privacy-preserving planning problem, where the generated plans preserve admissible privacy. Then, we demonstrate how to generate opponent-aware privacy-preserving plans. The search-based planning algorithms were restricted to public information shared among the cooperators. The observation of the adversarial observer could be purposefully controlled by exploiting decoy goals and diverse paths. Finally, we model the security patrol problem, where the UGV restricts information sharing and attempts to obfuscate the goal. The simulation experiments with privacy leakage analysis and an indoor robot demonstration show the applicability of our proposed approaches.


Introduction
With the development of intelligent unmanned system technology, unmanned ground vehicles (UGVs) have become extremely rugged for the harshest military use, such as executing monitoring tasks in harsh and complex urban environments [1]. As our world becomes increasingly well-connected, there is an increased need to enable UGVs to cooperate in generating plans for security patrol. For example, the iRobot's PackBot, which has played a critical role in providing situational awareness for anti-terrorist operations [2].
Several approaches have been proposed in recent years to address the conundrum of privacy preservation through controlling privacy leakage for different requirements under contested environments. One of them is differential privacy [3], which adds appropriate noise to the transmission state in order to limit the opponent to acquiring only the true state of the transmitted signal at a predetermined level of accuracy. Another approach uses cryptography for secure multi-party computation (MPC). In [4,5], the authors encrypt the messages with a public-key homomorphic cryptosystem and apply techniques (e.g., random masking and random permutation) to protect the agents' privacy. So, the encrypted messages can be exchanged among the agents in various ways [6]. The UGVs should patrol two zones, each with four candidate checkpoints, and the supply center will provide support for the UGVs.
Regarding urban security patrol, some checkpoints located around the urban trunk road are high risk. Thus, it is feasible to deploy UGVs to patrol such checkpoints regularly and collect information (e.g., images, video, etc.). Although UGVs have become quite ubiquitous in patrols with logging and tracking capabilities, they are mostly at risk. The hostile observer will constantly monitor the task execution and get access to the UGVs' data and actions. The challenge of privacy preservation arises because all aspects of information are private and the UGVs are not eager to share, which drives us to compute privacy-preserving plans that can protect privacy when executed in cooperative and adversarial environments.
In this paper, we address the problem of opponent-aware privacy-preserving planning for security patrol and attempt to answer the following questions: what is opponent-aware privacy-preserving planning, and how can we generate opponent-aware privacy-preserving plans? Our contribution lies in the opponent-aware privacy-preserving planning architecture. In a cooperative setting, the search-based planning method could be restricted to obtain the public information shared by the cooperative agents. Whereas, in an adversarial setting, the observation of the adversary could be purposefully controlled by exploiting decoy goals and diverse paths. Finally, simulation experiments with privacy leakage analysis and indoor robot demonstration show the applicability of our proposed approaches.
The rest of this paper is organized as follows. In Section 2, some related works about privacy, security, and metrics are presented. In Section 3, we decompose the opponent-aware privacy-preserving planning problem into two subproblems from different perspectives. In Section 4, experimental evaluations of plan generation and information leakage analysis were conducted. In Section 5, we conclude this paper and point out further directions.

Privacy and Security Assumption
Privacy and Security: Many privacy models have been adopted in multi-agent planning according to three different criteria: the information model (imposed privacy [19], induced privacy [20]), the information-sharing scheme (MA-STRIPS [19], subset privacy [21]), and practical privacy guarantees (no privacy [22], weak privacy [23], object cardinality privacy [24], and strong privacy [9]). Privacy can be divided into different categories, such as agent privacy, model privacy, decision privacy, topology privacy, and constraint privacy [4,14]. Here we introduce some widely used types of privacy.
Definition 1 (Agent privacy). No agent should be able to recognize the identity or existence of another agent.
Agent privacy can be achieved by employing anonymous or coded names. Such as one agent would not want the opponents to know the identity or existence. related to the concept of semantic security from cryptography [27], where secure plans build on the concept of independent inputs [28]. A secure plan is always private, which imposes an additional constraint (all possible goals must result in to the same observations) to the privacy problem [29].
Security Assumption: In [30], the authors define the notion of privacy-preserving planning based on secure MPC and provide some proper analysis of privacy leakage in multi-agent planning. Many assumptions specify the properties of the agent, environment, and algorithm in some secure multi-party computation literature [10,28,31].
Assumption 1 (Adversary model). An honest but curious adversary who is passive and follows the algorithm and the protocol correctly but may glean information from the execution and communicated data to learn about the privacy. A malicious adversary, who can actively deviate from the protocol specification.
Assumption 2 (Algorithm known). The adversary has access to the algorithm and knows how the algorithm works. The agent should not rely on the privacy of the algorithmic mechanism itself. Assumption 3 (Input independent). The adversary can rerun the algorithm by setting different goals as real goals to check the variability of the output.

Assumption 4 (FIFO).
When the actor takes action to reach a corresponding state, only then does the adversary receive the corresponding observation in the order which was emitted by the plan execution.
As is usually done by cryptography, these assumptions do not take the adversary's recognition model into consideration, which is quite different from the artificial intelligence (AI) community.

Privacy-Preserving Planning
The planning problem of privacy preserving can be modeled as a multi-agent planning (MAP) problem with a privacy-preserving requirement. MAP comes in different types, such as deterministic MAP (DMAP) [19,32], interactive partially observable Markov decision processes (I-POMDPs) [33], and decentralized POMDPs (Dec-POMDPs) [34]. Regarding privacy, there are many synonymous concepts in the recent literature, which all aim at generating obfuscated behavior, such as deception, security, and obfuscation, as shown in Table 1. A secure plan is always private; a deceptive plan is always obfuscating, but may or may not be dissimulating [29]. A simple illustration of different strategies is shown in Figure 2. Table 1. Some synonymous concepts of privacy.

Concepts Main Contributions
Obfuscation k-ambiguous and d-diverse [35] one candidate goal [36] secure MAFS [9] Privacy privacy leakage [10] plan set intersection [11] privacy-preserving policy iteration [4] Security equidistant states [28] last deceptive point [37,38] deceptive shortest path [39] equidistant states [28] Deception bounded deception [40] hide intention [41] λ Deception [42] deceptive adversary [43]  In a cooperative environment, many multi-agent planners have been proposed to address privacy-preserving planning problems, such as MAFS (multi-agent forward search) [30], MADLA (multi-agent distributed and local asynchronous) [44], and PSM (planning state machine) [11]. In [11], the author proposed one secure planner for multi-agent planning, but this planner is impractical to compute all possible solutions. In [9], the authors introduce a modified version of the multi-agent forward search algorithm, Secure-MAFS [30], which is implemented based on an equivalent macro sending technique [24]. Some privacy guarantee planning algorithms have been provided in [9], but they are restricted to very special cases.
In an adversarial environment, the adversary implicitly uses the signal behavioral cues of the actors during the plan execution, and perform diagnosis on the internal information based on the resulting observations. Recently, there has been some interest in exploring privacy preservation [36], goal obfuscation [28,35], deception [37,38], intention hiding [41], etc. In [35], Kulkarni et al. attempted to make plans with k-ambiguous goals, but they were not guaranteed to be secure. In [36], Keren et al. proposed to preserve privacy by keeping their goal ambiguous for as long as possible, but there was only one candidate goal and one partially obfuscated plan. In [38], Masters et al. applied some deceptive strategies for path planning, but these do not support deception when the adversary knows the explicit model. In [28], Kulkarni et al. proposed to securely obfuscate the real goal by making all candidate goals equally likely for as long as possible, but the heuristic deployed makes the planner incomplete. All these studies employ goal or plan recognition modules.

Information Leakage Metric
Although the key motivation for privacy-preserving planning is preserving privacy, some private information will be leaked during the planning, which means it is impossible to achieve complete privacy. If one malicious teammate directly receives any of the private information, or can indirectly deduce the privacy from the communicated public information, the privacy information will be leaked. To evaluate the privacy leakage, we consider the foundations of quantitative information flow [45]. The leakage of the private information is based on the uncertainty of the adversary about the input. Here we use the min-entropy (an instance of Rényi entropy [46]) as a better measure of the privacy information leakage (PIL): where the initial uncertainty is H ∞ (H), the residual uncertainty is H ∞ (H|L).
Using the uniform distribution case, we denote the number of states as t prio and t post , then the remaining uncertainty gives a security guarantee. The expected probability that the adversary could guess H given L decreases exponentially with H ∞ (H|L): 2 −H ∞ (H|L) = 2 − log t post = 1/t post , and we can obtain the privacy information leakage: (2)

Opponent-Aware Privacy-Preserving Planning
In privacy-preserving planning (PPP), it is important to acknowledge that two potential opponents are involved: the malicious teammate (cooperator) and the hostile observer (adversary). PPP should produce plans that reveal neither the goal nor the activities of the agents, but many planners cannot have completeness, strong privacy preserving, and efficiency together. So, it is practical for them to achieve opponent-aware privacy preservation within bounded privacy information leakage. As illustrated in Figure 3, privacy leakage will occur at the information layer and decision-making layer. At the information layer, differential privacy and homomorphic encryption are applicable techniques to protect private information.  In this paper, we mainly focus on the middle layer for decision-making. For task planning in cooperative environments, we need to restrict the information sharing to malicious teammates, and for the path planning in adversarial environments, we need to control the observability of the adversary. Here, we define the opponent-aware privacy-preserving planning problem as follows: Definition 5 (Opponent-aware privacy-preserving planning). We define opponent-aware privacy-preserving planning as a multi-agent planning problem of protecting privacy secure to certain extent considering two opponents.
As a result, the generated plans protect privacy from two potential opponents: the malicious teammate and the hostile observer. In a cooperative setting, to cope with malicious teammates, we should restrict the disclosure of private information to malicious teammates. In adversarial settings, real combat scenarios often consist of hostile opponents, so we will add obfuscation to control the observability of the opponents.

Information Sharing Restricted Task Planning
In cooperative environments, the agents are cooperative in concurrently planning and executing their local plans to achieve a joint goal. We could model all other agents as a single adversary, who can collect the information to infer more. Information sharing restricted task planning with privacy preservation can be defined as follows [10]: Definition 6 (Information sharing restricted task planning). For a set of agents N , the information sharing restricted task planning problem for multi-agent M = {Π i } |N | i=1 is a set of agent problems, where for each agent n ∈ N the problem is: where V i is a set of variables, s.t. each V ∈ V i has a finite domain dom(V), if |dom(V)| = 2, then all variables are binary. V pub is the set of public variables common to all agents and V priv i is the set of variables private to The state I is the initial state and G is the goal.
Each action is defined as a tuple a = pre(a), eff(a), cost(a) , where pre(a) and eff(a) are partial states representing the precondition and effect, respectively, cost(a) is the cost of action a. So, the state transition can be defined as Γ(s, a) |= s ∪ eff(a). We follow the formal treatment of privacy-preserving planning from [10,30], for each agent n ∈ N , the private parts of the problem Π i are:

Task Plan Generation
The multi-agent planning problem can be viewed from different perspectives, called projections. The view of a single agent n i on the global problem is not the only Π i , projections of other agents are available as well. As for agent n i , the public projection of an action a ∈ A pub i is a = pre(a) , eff(a) , the public projection of Π i can be represented as follows: So, the task planning solution to Π i is a sequence π i of actions from A i ∪ j =i A i , the goal state G k = π i • I, which means Γ(I, π i ) |= G k . The public projection of π is π = (a 1 , ..., a k ) with all private actions omitted. The global solution of M is a set of task plans {π i } |N | i=1 , s.t. each π i is a local solution to Π i . If π i = π j for the public action, we call these local solutions equivalent.

Privacy Leakage Analysis
We adopt the privacy leakage metric from [47,48], as we set the number |V priv i | ≤ p and the size d = max V∈V priv i |dom(V)|. The prior information is a tuple: The additional information obtained by the adversary is a sequence of messages exchanged between the agents N = (n 1 , ...n k ). After information exchange during the planning process, the posterior information available to the adversary is a tuple: Considering the transition system of the Π i , we associate the prior information I prio and I post with variables τ(I prio ) and τ(I post ), which represent the uncertainty of the planning algorithm. So, the final information leakage is computed as: The upper bound of all transition systems' number is t 0 = 2 d 2 − 1 p . After classifying the actions into five categories, i.e., inital -applicable (ia), not-inital-applicable (nia), privately-dependent (pd), privately-independent (pi), privately-nondeterministic (pn) [47], the final information leakage formula is as follows: In this paper, we mainly use MAFS algorithms for task planning, and the privacy leakage can be computed as follows: we first reconstruct the search tree, then identify the parent states and applied actions, and classify the actions into five classes (ia, nia, pd, pi, pn). Finally, we compute the information leakage (see Algorithm 1 for details) . Here, the privacy leakage computation with sets of actions can be reformulated as a mixed-integer linear program (MILP) problem with disjunctive constraints. For the possible number of the transition system, we construct the following combinatorial optimization problem, which can be solved using the off-the-shelf solver IBM CPLEX [49].
where the t X ≤ t 0 , action type X ∈ {ia, nia, pd, pi, pn}, A X ⊆ A .

Observability Controlled Path Planning
In adversarial environments, the observed agents try to control the observation of the adversary by obfuscating their goals. Considering the observation of the adversary in the adversarial setting (mission planning, reconnaissance, etc.), privacy immediately follows from setting with partial observation [28,35,40]. The observability controlled path planning problem is to find a path from the start location to the goal on the navigation map (discrete grid, connected graph, or continuous space representation). So, the discrete path planning problem can be defined as follows: Definition 7 (Observability controlled path planning). For every agent n ∈ N , the observability controlled path planning problem is a tuple [35]: • D = S, A, c is the path planning domain, S is a non-empty set of location nodes, A ⊆ S × S is a set of action-related edges, c : E → R + 0 returns the cost of traversing each edge. • I ∈ S is the start location and g r ∈ G is the real goal; • G = {g r ∪ g 0 ∪ g 1 . . . } is a set of candidate goals, where g r is the real goal • Ω = {o i |i = 1, . . . } is a set of m observations that can be emitted as a result of the action taken and the state transition. • O : (A × S) → Ω is a many-to-one observation function which maps the taken action and the next state reached to an observation in Ω.
In adversarial environments, the adversary will receive the observation sequence associated the actions performed by the observed agent. We could model this process as a one-sensor model, where the adversary maintains one belief space according to the observations. Following the definition of belief space from [35], we take the belief space of the adversary into account in path planning, so as to control the observability of the adversary.

Path Plan Generation
Decoy Goals: If the adversary is aware of the actor's candidate goals but not the real goal. An observability controlled path plan is to hide the real goal with decoy goals, where G n = {g 0 ∪ . . . ∪ g n−1 } is the set of decoy goals, the observation sequence should not be biased by any goal in G n . The objective here is to minimize the privacy leakage by making the adversary's belief space consistent with the decoy goals.

Definition 9.
An observability controlled path plan with decoy goals: Γ(I, π k ) |= g r and |G ∈ G : ∃s ∈ b n , s| = G| m (14) where m < n, for ease of computation, in this paper, we set m = 2, as in the final stage, path generation will only depend on two goals.
Observability Controlled Path: Predictability and obfuscation are a pair of incompatible concepts. With the decoy goals selected, patrolling on an observability controlled plan, the observed agent will start with obfuscated steps, but the agent will adopt predictable steps when approaching the goal in the end. So, one observability controlled path is the one with steps that are obfuscated for as long as possible. There is one obfuscated turning point, where all subsequent steps are predictable.
We will employ one probabilistic goal recognition model as the adversary's sensor model. P (g r |O n ) ≤ P (g|O n ), ∀g ∈ G n \{g r } (15) Definition 11. A last obfuscated turning point is the last state π i of one given path π, which all subsequent states, π j , ∀j ∈ {i + 1, ...|π|} are predictable to the adversary.
Here, we mainly focus on the last obfuscated turning point. The observability controlled path plan will cover two parts. As shown in Figure 4, one part of the obfuscated path from the start point to the last obfuscated turning point, and one part of the predictable path from the last obfuscated turning point (LOTP) to the real goal. We can get the strong goal obfuscated path π with continually obfuscated steps to the LOTP. Using the cost-difference-based probabilistic goal recognition model introduced in [50], we can get the LOTP after selecting the decoy goals: where g d is the selected decoy goal, and optc(a, b) is the optimal cost from the state a to b. If we adopt discrete grid or graph-based discrete domain representations for path planning, we will approximate the LOTP to the closet state.  Diverse Path: When the adversary knows the observed agent's goal, in order to control the adversary's observability, we need diverse paths. We can compute the diversity between all the pairs of plans using one plan distance metric mentioned in Appendix A.1. Two plans are a δ distant pair with respect to distance metric d, if d(p 1 , p 2 ) = δ. A path plan set (PPS) induced by plan p starting at I is minimally δ distant if δ = min p1,p2∈PPS d(p1, p2).
As a result, if the adversary does not know the real goal, the first part of the path is done by performing a two-decoy-goals path planning. After getting the LOPT, we can compute the whole path plan. If the adversary does know the real goal, we need to generate diverse path plans. The details of an observability controlled path plan are given in Algorithm 2.

Privacy Leakage Analysis
Planning with obfuscated goals involves preserving privacy with minimized information leakage. Under the requirement of privacy preservation, the observed agent will deliberately choose misleading actions to obfuscate the goal. We can quantify the information leakage of the states and actions as follows: Definition 14 (A-PI). As for a i ∈ E (s), a j ∈ E (s), and E (s) = E (s)\a i . The information leakage based action privacy information metric is defined as: Using the action privacy information metric I S−PI (s) as additional action cost, we can analyze the privacy leakage of the observability controlled path plan.

Experiments
In this section, experiments were conducted for opponent-aware privacy-preserving planning.
All the experiments were executed on one Alienware running Ubuntu 16.04 with 4 CPU cores and 8 GB of RAM. We used the MAFS algorithm [30] for information sharing restricted task plan generation. The algorithms for privacy leakage analysis and observability controlled path planning were coded with Python.

Plan Generation and Privacy Leakage Analysis
Here, we first generate task plans for a robot in the urban security patrol scenario. Then we present three different goal configuration scenarios for path planning. Besides, we analyze the privacy leakage for the task plan and path plan. Finally, we present an indoor robot demonstration using the TurtleBot3 Burger [51].

Task Plan Generation and Privacy Leakage Analysis
As shown in Figure 1, we now define some variables for security patrol scenario. The interaction of the simplified security patrol scenario can be modeled between four agents: two UGVs, one supply center (the malicious teammate), and the hostile observer (opponent).
Variables Definition: For task planning under a cooperative environment, N = {UGV1, UGV2, SC}. After patrolling any candidate checkpoint in zone 1, UGV will return to the supply center to charge and transmit collected data, and the task will be completed after patrolling the two zones.
As shown in Table 2, we set the binary variables with T/F values. In the initial state, the supply center has enough supplies, the UGV is charged. In the goal state, the task of the UGV is complete. The following set of variables can be used to describe the task planning problem.
Information Sharing Restricted Task Plan: Here, we simply set UGV1 for zone 1 and UGV2 for zone2. Each UGV will choose two checkpoints to patrol (e.g., checkpoint 1 and 3). The actions of A UGV1 and A SC can be formulated as shown in Table 3. We provide the action descriptions for UGVs and the supply center in Figure 5. Table 2. Variables for task planning.
In the following, we will compute the task plan of UGV1 for zone 1 security patrol. The projection of public actions and related transition system are shown in Figure 6. The projection results of the public actions are shown in Table 4. The actions PC1 , PC3 ∈ A UGV , RC , RR ∈ A SC both have an equal projection. We denote them simply as PC and R .  We chose MAFS and Secure-MAFS algorithms for task plan generation. The solution of UGV1 to the security patrol scenario is π UGV1 = {R, PC, R, PC, TC}, which is public to the supply center.
Privacy Leakage Analysis: Then, complete transition is shown in Figure 7. In MAFS, if the state of the UGV is expanded using public action, the resulting public projection state will be sent to the supply center. We analyzed the privacy leakage based on the sent and received states from the UGV.

Path Plan Generation and Privacy Leakage Analysis
Observability Controlled Path Plan: As shown in Figure 8, we used a 13 × 13 discrete grids based simulation environment with different configurations (line, circular, and triangular) for experimental evaluation. We simply set m = 2, k = 2, and the UGV patrolled one checkpoint through one observability controlled path and chose one diverse path back to the supply center. For any checkpoint, after choosing the candidate decoy checkpoints, we used Algorithm 2 to generate an observability controlled path. Privacy Leakage Analysis: Following the "single-observation" cost difference based probabilistic goal recognition model from [50], we could pre-compute the cost difference for each state offline to calculate the likelihood that each goal will be the selected checkpoint. As shown in Figure 9, we could create heatmaps for the discrete grids domain, showing the posterior goal probability of each goal at each state. Armed with heatmaps, we could use the state/action privacy information metrics (Equations (13) and (14)) for privacy leakage analysis. The results of the privacy leakage of the paths to each checkpoint under different configurations are shown in Table 5.

Indoor Robot Demonstration
To simulate the security patrol scenario with an internal robot and an external human, we used the TurtleBot3 Burger for an indoor robot demonstration. The TurtleBot3 Burger is a mobile robot platform established on ROS (robot operation system). Table 6 shows the configuration. As shown in Figure 10, the TurtleBot3 Burger contains several modules, and we designed the ROS nodes for the software framework and built an experimental scene with four checkpoints. The initial state of the robot is in the middle of the scene. As shown in Figure 11, after generating the information sharing restricted task plan, the robot should generate an observability controlled path plan for checkpoint patrol. As for any checkpoint, the robot will follow the generated path to visit. The trajectories of the robot and the objects in the scene were visualized through RVIZ, and the environment map was built through the Lidar LDS-01.

Conclusions and Future Work
In this paper, the opponent-aware privacy-preserving planning problem in a complex environment is addressed and two questions are answered. Owing to the explosion of privacy preservation in planning, we first define opponent-aware privacy-preserving planning. Then, we present approaches for information sharing restricted task plan generation and observability controlled path plan generation. The final experiments with privacy leakage analysis and the indoor robot demonstration show the applicability of the proposed approaches to generating plans. In fact, many pieces of research have modeled the interaction between patrol UGVs and adversary with Stackelberg or stochastic games, in which agents pursue utility maximization. Additionally, many robust and online goal recognition approaches have been proposed, such as the self-modulating model proposed in [38] for rational and irrational agents. In the future, we will use a stochastic game model with active adversaries to model this problem.