Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities

: Thanks to the advent of new technologies and higher real-time computational capabilities, the use of unmanned vehicles in the marine domain has received a signiﬁcant boost in the last decade. Ocean and seabed sampling, missions in dangerous areas, and civilian security are only a few of the large number of applications which currently beneﬁt from unmanned vehicles. One of the most actively studied topic is their full autonomy; i.e., the design of marine vehicles capable of pursuing a task while reacting to the changes of the environment without the intervention of humans, not even remotely. Environmental dynamicity may consist of variations of currents, the presence of unknown obstacles, and attacks from adversaries (e


Introduction
The marine domain deserves dedicated theory and schemes for the autonomous optimal path planning of unmanned vehicles. On the one hand, traversing the water surface can be rather challenging. Other than the presence of adverse weather conditions and currents, in some regions of the world, maritime piracy is a severe issue [1,2]. Therefore, the design of secure routes and defense mechanisms has now become of worldwide interest [3].
On the other hand, the underwater environment is considered to be even a greater challenge for the path planning of autonomous underwater vehicles because of its hostile and dynamic nature [4][5][6][7]. The major constraints for path planning are the limited data transmission capability and the power and sensing technology available for underwater operations. The sea environment is subjected to a large set of challenging factors, classified as atmospheric, coastal, and gravitational factors. Above water, most autonomous systems rely on radio or spread-spectrum communications along with the global positioning system (GPS). This is not possible in underwater environments, where AUVs (autonomous underwater vehicles) must rely on acoustic-based sensing and communication (higher range but lower data rates, smaller bandwidth, higher latency and unreliability). Thus, without information regarding the direction and restricted power, it is very difficult for an AUV/underwater glider to navigate towards the desired target.
Game theory [8] is one of the mathematical tools that has proved to be very effective for modeling and solving some of these real maritime challenges. This paper aims at providing an overview on this synergistic coupling (marine path planning problems and game theory) by reviewing the state of the art. In fact, we propose the categorization of planning tasks in six game-theoretic classes: pursuit-evasion games, coverage games, search games, rendezvous games, coordination games, and patrolling games. Then, for each of them, we discuss concrete applications in the marine environment; e.g., the search for underwater mines or the surveillance of maritime routes.
The work is organized as follows: Section 2 briefly recalls the game-theoretic background needed to understand the review; then, Sections 3-7 discuss marine autonomous path planning problem modeling in accordance with the six classes and their solution leveraging game-theoretic tools; Section 8 proposes possible future research directions and potentially fruitful synergies; finally, Section 9 summarizes the work and provides the conclusions.

Game Theoretic Background
This section recalls some fundamental knowledge that is useful for a better understanding of the work. We first review basic game-theoretic concepts and then introduce the path planning problems typically faced in the marine domain and tackled by gametheoretic tools.

Game Theory
Game theory [8] is a mathematical discipline that studies the interaction (either competitive or collaborative) between two or more rational agents, commonly called players The term game refers to the mathematical model that describes the interactions among the players: their possible strategies, their incomes (known as payoffs), the effect the environment can have on their choices, etc. Rarely, games can involve just one player; in these cases, an additional player is usually added to the game-known as Nature-in order to represent the uncertainty which affects the agent's choices [9]. Each game can be characterized by several properties; the next lines review the properties which concern the models discussed in this work. Details about them and further information can be found in [8][9][10][11] and references therein.
A game can be either cooperative or non-cooperative depending on the basic modeling unit: in the latter, this consists of an individual, while in the former, it is a group of individuals. Notice that in both the case the players can be considered as selfish agents: indeed, they always seek to maximize the utility of the basic modeling unit. A game is said to be symmetric if the players share the same set of strategies and their payoffs depend only on the strategy profile adopted-i.e., the strategies played by the agents-while the player who adopted a particular behavior is irrelevant. All other games are asymmetric. To model agents' stochastic behavior, a game can allow reasoning in terms of mixed strategies. Given a set of strategies Ω for a player, henceforth called pure strategies, the set of mixed strategies ∆ for the same player is defined as the set of all the possible probability distributions over Ω. Mixed strategies are particularly useful when considering repeated games; i.e., games that periodically allow the agents to interact and earn payoffs. The time horizon can vary depending on the game: it can be infinite, finite but unknown, finite and known, or null (in this case, the game is not a repeated game).
The choice of which strategy to adopt depends on a large number of aspects; however, the most important is the player's utility function: a function representing the agent's satisfaction concerning each possible strategy profile. Notice that this assumption does not prevent the losses of the opponents (or any other harmful interpretation of the interaction) to be represented by the utility function. One further decision element is the information the players have on the game, which is described by two orthogonal features: completeness and perfection. The information is said to be complete if the players are aware of all the properties of the participants: utility functions, sets of strategies, typology (e.g., risk-averse), etc. On the other hand, information is said to be perfect if, in a repeated game, the players are always aware (directly or indirectly) of all the previous choices (read moves) made by the participants. Finally, some strategies can be a priori discarded by a player because they are dominated. A strategy is said to be dominated if there is another (mixed or pure) strategy that always yields, whatever the other players do, a strictly higher payoff.
Games can also be categorized in accordance to less elementary aspects. For instance, the property of being a zero-sum game comes from the fact that, regardless of the strategy profile adopted, the payoffs of all players always sum up to zero. In contrast, in the case that the set of pure strategies is discrete for all the players, the game is referred to as a tensor game. This because the outcome of any strategy profile can be represented by an entry of a properly built tensor where each dimension is associated to a player. If the game involves only two players, it is called a bimatrix game, as it is enough to have one matrix per player to model their strategical interaction. A two-player zero-sum game with discrete strategy spaces is simply a matrix game. Indeed, the losses of a player are the earnings of the other (zero-sum property), and therefore one single matrix is enough to summarize all the information. A potential game is a cooperative game where the incentive of all players to change their strategy can be expressed using a single global function called the potential function. A bargaining game admits either a cooperative and a non-cooperative interpretation. Both cases model a scenario in which the players have to find an agreement on the strategies to play; in case no agreement is found, the so called disagreement payoff (or better, the disagreement point in the payoff space) is assigned and the game ends. In the most common scenario, utility is not transferable, which means that it is not enough to propose a strategy profile where the overall income is greater the total disagreement payoff to find an agreement. Stackelberg games model scenarios where a subset of the players (referred to as leaders) acts before the remaining ones (the followers). In particular, the latter choose the strategy to adopt after having observed the ones adopted by the leaders and their effect on the game. A differential game involves players who jointly control (through their actions over time, as inputs) a dynamical system described by differential state equations. Hence, the game evolves over a continuous-time horizon, during which each player is interested in maximizing their utility. The latter depends on the state variable of the dynamic, i.e., the game, on the self-player's action variable (different actions can require different efforts to be implemented and be used as control input of the system), and also possibly on other players' actions. A game is said to model a bilateral symmetric interaction (BSI games) if the utility of each player can be decomposed into symmetric interaction terms which are bilaterally determined and a term depending only on the player's own strategy. As a corner case, two-person symmetric games are BSI games. Intuitionistic fuzzy games are a wider class of games in which payoffs are represented by intuitionistic fuzzy sets. This fact allows one to better model knowledge uncertainty, players' bounded rationality, hesitancy, and behavioral complexity. All the classical (or crisp) games are special cases of intuitionistic fuzzy games.

Rationality of the Players
The last argument to introduce is players' rationality. Without entering into this controversial topic too deeply, let us focus on the strategy profiles the players adopt while seeking to optimize their utility. In most cases, adopting a strategy that allows a player to achieve the maximum utility possible may result in a notably lesser payoff, since the outcome of a game also depends on the other players' choices. Put in another way, the greedy approach is usually very easy to punish by other agents. This fact has led to several definitions of strategy optimality and rational player behavior. Probably the most used notion in this sense is the Nash equilibrium: any strategy profile from which the unilateral deviation of one single player is of no benefit for them. Stackelberg equilibria represent the transposition of Nash equilibria to the case of Stackleberg (leader-follower) games. In spite of the clarity of what is meant by rationality in the previous two definitions, researchers argued that they do not truly model human beings' actual rationality. One of the alternatives derived from this debate is the quantal response equilibrium [12,13], which is emerging as a very promising approach to model human-bounded rationality [14,15]. It suggests that instead of strictly maximizing utility, individuals respond stochastically in games: the chance of selecting a non-optimal strategy increases as the cost of such an error decreases. In fact, the quantal response model assumes that humans choose better actions at a higher frequency, but with noise added to the decision making process.

Path Planning Games in Marine Domain
Pursuit-evasion games are a subclass of differential games introduced in 1965 by Isaacs [16] that has received a large degree of attention since then, mainly for air combat scenarios. The basic version of a pursuit-evasion game consists of a two/three-dimensional environment and a time horizon (finite or infinite) and involves two players: the pursuer and the evader. From a theoretical point of view, the optimal strategies of the agents are given from the solution of a nonlinear, partial differential equation known as the Hamilton-Jacobi-Issacs (HJI) equation. However, the problem is far from being solved in practice, since solutions of HJI equations are not available in general. Level set schemes have however been shown to be very efficient at solving these governing planning equations [17][18][19][20][21], more so than graph search schemes [22]. They have been used onboard real ocean vehicles at sea [23] and employed to solve pursuit-evasion games [24] as reviewed in Section 3.
Coverage games are cooperative games with the goal of realizing a full and efficient coverage of an a priori unknown area by means of player movement. In case of autonomous vehicles, they are required to cooperatively scan the search area without human supervision. However, due to the lack of a priori knowledge of the exact obstacle locations, the trajectories of autonomous vehicles cannot be computed offline and need to be adapted as the environment is locally discovered. A typical scenario is sea surface oil spill cleaning [25,26].
Search planning games model the problem of scanning an area in search of something; e.g., undersea mines [27]. In this context, a good search plan is one that maximizes the efficiency of the search, expediting the discovery of the intended search objects while minimizing the number of search agents required to do so. However, false alarms compromise the efficacy of a search plan by disrupting search agents' capabilities to identify undersea objects of interest. When the detection is uncertain, an additional search effort must be applied to confirm or deny the nature of the contact. Search planning games are often one-person games (the searcher), while Nature plays the role of the hider of objects.
Rendezvous games are differential games that focus on generating optimal trajectories between a starting and a terminating point, away from hazardous regions and obstacles. Optimality may refer to several quality parameters; e.g., energy consumption or travel time. The game-theoretic aspect concerns how to model and solve the obstacle avoidance and, if so, the interaction between multiple autonomous vehicles along the path. As for pursuit-evasion games, the solution of the control problem is usually far from being easily computed, both in a closed form and numerically. For instance, the solution to the minimum time navigation problem in dynamic flows is governed by a Hamilton-Jacobi-Bellman equation [28] (calculus of variations). Examples in path planning include the interception of ships [29].
Coordination games seek to coordinate the motions of groups of vehicles, mobile sensors, and embedded robotic systems to be deployed over regions. These coordination tasks must be achieved while respecting communication constraints and with limited information about the state of the system. The motion cooperation may be achieved in several ways: from simply having more vehicles pursuing different pre-planned missions in different areas, to interaction among the vehicles during the mission, and to strict formation control [30,31].
Patrolling games describe security scenarios with limited resources, which prevent full security coverage at all times. Therefore, limited security resources must be deployed intelligently taking into account differences in priorities of targets requiring security coverage, the responses of the adversaries to the security posture, and potential uncertainty over the types, capabilities, knowledge, and priorities of the adversaries faced. Applications of patrolling games involve protecting critical national infrastructure and curtailing illegal smuggling (drugs, weapons, money, etc.), as well as protecting wildlife (fish and forests) from poachers and smugglers [32].
In the next sections, we review the main contributions in the literature discussing each type of game and concerning the dynamic marine environment.

Pursuit-Evasion Games
This section reviews the major contributions about pursuit-evasion games. In [24], a reachability-based approach is proposed to deal with the pursuit-evasion differential game between two players in the presence of dynamic environmental disturbances (e.g., winds, sea currents). In [33], the authors extend the previous work to the case of multiple pursuers. In [29], the authors validate the efficacy of the proposed methodology with testing in realistic data-assimilative simulated environments. A theoretical background and seminal studies on the use of this approach for path planning in marine environment can be found in [17,34,35]. Reachability-based approaches may also suggest interpretations according to Blackwell's approachability theory [36]. Indeed, the purser seeks for a time-dependent strategy that guarantees the approachability of any proper subset of their own target set; i.e., the evader's reachable set. The peculiar property here is that the target set also changes over time. An example of approachability theory applied to subsets of the target set is the geometric approach to multi-criterion reinforcement learning problems [37].
Reinforcement learning is also at the core of [38], where the problem of illegal and unreported fishing is modeled as a pursuit-evasion game between supervising autonomous vessels and poachers. The pursuer's optimal control is obtained by leveraging the fuzzy actor-critic learning algorithm [39,40]. Here, both the actor and critic are modeled as fuzzy inference systems in order to cope with the natural uncertainty of the constantly changing environment, which reflects the noise and additional complexity in the action space. The effectiveness of the approach is evaluated on two different real world scenarios: the gulf of Lawrence (Canada) and bay of Fundy (Canada/USA).
In [41], a model is presented for integrated trajectory planning for non-cooperative unmanned systems via multi-agent rolling-horizon games. The authors consider a multiplayer maritime pursuit-evasion game in which players have opposing quadratic cost functionals based on concepts from optimal trajectory programming. The model generates a system of equilibrium trajectories for all players via a mixed complementarity problem formulation using the KKT (Karush-Kuhn-Tucker) optimality conditions. Rolling-horizon foresight and uncertain obstacles are incorporated into the model, both of which improve model performance in determining feasible solutions. In [42], the authors present a neural network-based approach to find an equilibrium solution (i.e., the players' optimal trajectories during the chase) via the minimax algorithm to an asymmetric skirmish between an unmanned underwater vehicle (UUV) and a manned submarine. In fact, each player is represented by a neural network which plays the role of the agent's cost function; i.e., the output of each neural network is the utility of the corresponding player. The input to the neural network is the vectorial representation of the current state of the repeated game as perceived by the player. Asymmetry is reflected by the fact that the agents have different capabilities, and they do not share the same view of the environment and its state. Notice that the desired output of the neural networks is unknown a priori. Therefore, weights inside the networks are tuned using an evolutionary approach, rather than the backpropagation learning algorithm: a genetic algorithm refines the players' strategies by improving the neural network, which outputs the state utility upon which the players actions are chosen. This means that the evolutionary procedure requires the game to be simulated over a number of steps until convergence to the Nash equilibrium. In [43], the authors propose a cooperative dynamic maneuver decision-making algorithm based on intuitionistic fuzzy game theory. Fuzzy sets allow one to fully cope with underwater environments with different kinds of uncertainties. An ad-hoc particle swarm optimization method [44] is used to compute the optimal strategy; i.e., the one that leads to the Nash equilibrium satisfying the intuitionistic fuzzy total order. To do this, the authors build a fuzzy pay-ment matrix of the cooperative dynamic maneuver game from the fuzzy multiattribute evaluation of an AUV maneuver strategy. The use of intuitionistic fuzzy theory makes the expression of uncertain information more clear and accurate than the original fuzzy theory. The hesitancy better integrates in the model the underwater uncertainties such as the changeable marine environment, the complex background noise, and communication difficulties. As a case study, the pursuer-evader scenarios are considered. In [45], the very same authors improve the previous work, also considering the marine environment and time sequence of situation information characteristic. To accomplish this task, they leverage fractional-order particle swarm optimization [46]. The latter is an enhanced version of the basic particle swarm optimization algorithm which mitigates the risk of becoming stuck in a local minimum by using fractional derivatives (rather than simply integer derivatives)a tool to provide a memory of the past events (with decreasing importance over time) of the search process [47].
Finally, Table 1 provides a classification of the aforementioned papers considering the approach used to solve the games adopted by each of them. Table 1. Approaches to solve pursuit-evasion games and papers which use them.

Coverage and Search Planning Games
In [48], a game-theoretic method is presented for the cooperative coverage of a priori unknown environments using a team of autonomous vehicles. The cooperative coverage method is based upon the concept of multi-resolution navigation, which consists of combining local navigation and global navigation. The main advantages of this algorithm are (i) the local navigation enables real-time locally optimal decisions with a reduced computational complexity by avoiding unnecessary global computations, and (ii) the global navigation offers a wider view of the area seeking for unexplored regions. This algorithm prevents autonomous vehicles from becoming trapped into local minima, which is commonly encountered in potential field-based algorithms. As a practical application, the authors investigate the cooperative oil spill cleaning of sea surfaces, even if the concepts can be applied to the general class of coverage problems. Nevertheless, the essential issue of the dynamic behavior of the oil spill is not modeled. In [49], the authors introduce the possibility of unexpected vehicle failures during the coverage process. To cope with this, the authors propose the novel, distributed, and cooperative algorithm named CARE (Cooperative Autonomy for Resilience and Efficiency). In both the works, the scenario is modeled using the theory of Potential Games [50], where the utility of each player is connected with a shared objective function for all players. In case of vehicle failures, CARE guarantees complete coverage by filling coverage gaps by the optimal reallocation of other agents. This provides a high level of resilience to the approach, albeit with a possibly small degradation in coverage time.
In [51,52], the authors address multiple UUV search planning to find hidden objectse.g., mines in undersea environments-where the sensor detection process is subject to false alarms with a geographically varying likelihood. The authors developed a game-theoretic approach to maximize the information flow that occurs as a multi-agent collaborative search is conducted over a bounded region. To accomplish this, they leverage search channel formalism [53]-an information theoretic tool which models the information flow over small region (cells) of the search space. It allows the authors to compute the information measure of each cell as a function of searcher regional visitation. This translates in the execution of a Receiver Operator Characteristic (ROC) analysis [54]. It consists of the representation of the search channel quality as a ROC curve-a graphic which establishes the relationship between the probability of detecting the target object and the related probability of declaring false positives. This allows the authors to map the search strategies (reads the planning problem to choose the search path which maximizes the information collection) to the inference of ROC operating points; i.e., the detection thresholds that maximize search performance. In this way, the game payoff is represented by two terms: the cost due to the search effort and the benefit of the information collected. In [55], a more detailed investigation of such search games is provided. In particular, it provides an analysis of the properties of the information measure in the channel and of the impact of having available multiple ROCs to choose from when generating information. In [56], the authors further improve the model allowing different search horizons to be set within the area search game. Even if the study is conducted under very restrictive hypotheses, the preliminary evidence suggests a need to balance performance criterion satisfaction with the opportunity to accelerate the search by reacting to observed detection events. Additional studies leveraging ROC information for search planning are [57,58].
Finally, Table 2 provides a classification of the aforementioned contributions, as a function of the solution scheme employed by each of them.

Rendezvous Games
The naive version of rendezvous games-i.e., those involving only pursuers and targets-admits an interpretation as a corner case of pursuit-evasion games. In fact, one can see a fixed target as an evader which adopts the dumb strategy of staying still. Therefore, the reachability approaches discussed in Section 3 are applicable as well. For instance, [18,19,21] leverage these techniques for time-optimal path planning tasks and [59] extends the previous works in stochastic scenarios, while [60] also considers the problem of risk minimization when dealing with uncertainties. Other performance indicators are reasonable in addition to time; e.g., in [61], the authors consider the problem of energyconsumption optimization. In [29], the authors compute the optimal route to a moving target in a highly dynamic ocean environment with tides, strong currents, and wind and wave forcing. Using a level-set approach, they successfully guide the time-optimal vehicles through regions with the most favorable currents, avoiding islands and regions with adverse effects, and accounting for the ship wakes when present. In [62], the authors study the issue of the optimal deviation from a planned path in case of encounters between ships. The authors propose two ways to model the problem: as a cooperative game and as a non-cooperative game. In particular, the latter falls in the category of zero-sum games, while both approaches are matrix-based; i.e., their optimal solution is a discretization of the continuous one. In both cases, the authors leveraged dual linear programming to find the Nash equilibria. Similarly, but in a less fine way, [63] proposes an optimal path deviation due to mine encounters. The methodology considers the dynamic of the mines induced by sea currents along with deviation quality indexes (e.g., distance from target, presence of obstacles, etc.) to build a matrix zero-sum game against Nature. Linear programming is used to solve the problem as well. In [64], the authors propose a strategy for a multiple UUVs rendezvous task in three-dimensional space. The goal is to make multiple UUVs starting at different positions converge at the same point (not necessarily simultaneously). The autonomous agents periodically exchange information about their own position through a distributed network whose topology is fixed a priori.
The proposed approach is a distributed optimization algorithm based on cooperative game theory and bargain game theory: this means that even if the goal is common and the inter-vehicle communication must be preserved, each agent behaves in a way that seeks to optimize their own selfish utility; e.g., minimization of fuel consumption. At each time step, the vehicles exchange information and evaluate the probity of the others as a deviation from the common goal. Then, the waypoint tracking control of a single UUV is designed in accordance with the potential game framework, which outputs the optimal strategy (the temporary point to reach) considering both the selfish interests and the neighbors' probity. Table 3 classifies the aforementioned contributions, as a function of the solution scheme employed by each of them. Table 3. Approaches to solving rendezvous games and papers which use them.

Coordination Games
In [65], the authors present an approach to AUV multi-vehicle coordination and cooperation based on the formalism of potential game theory. It shows how very simple potential games can be used in order to stably steer an AUV formation in the position that best compromises between the target destination of each vehicle and the preservation of communication capabilities among the vehicles. To obtain such a goal, the authors leverage the preliminary results [66,67], where mechanisms to enforce cooperation among AUVs are designed. In [68], a coordination control protocol is developed implementing a modified version of the Distributed Inhomogeneous Synchronous Learning algorithm [69] that is able to cope with highly dynamic environments. The proposed modification allows robots to react efficiently to environmental changes; as a consequence, teams of unmanned marine robots can track a threat without knowing its behavior a priori, as in the case of asymmetric threats. Furthermore, the authors implement a tool for team sizing: given the maximum threat velocity, the tool determines the minimum number of marine robots to be used in the system guaranteeing the desired security level of the area. On the contrary, in [70], the very same algorithm and the payoff-based Homogeneous Partially Irrational Play [71] are extended to the case of low dynamic environments. This extension transforms the algorithms from action-oriented to trajectory-oriented optimizers. This allows them to deal with antagonistic goals; e.g., scenarios where intruders have to be tracked while patrolling the area around a reference ship. In [72], the results in [65] are improved by proposing a distributed control algorithm that guarantees the equilibria point stability in the large rather than just the local equilibrium. The work is built upon the well known artificial potential methodology, but the innovative element is the use of the passivity theory [73]. The study in [74] overcomes a significant limitation of the previous approach; i.e., the static topology of the AUV communication network. The behavior of the group is made more flexible, with arbitrary split and join events, using an "energy tank" that is able to store and supply energy whenever required: exploiting this further passive element, the graph topology may change depending on the emerging needs of the mission. Finally, [75] presents a general framework for coordinating a team of AUVs, mainly based on the previous two works. A very interesting aspect is the point of contact the authors highlight between their approach to model and control a network of agents and the peculiar class of potential games known as BSI games [76]. On the same basis, [77] moves towards a novel interpretation of physical, multi-agent systems admitting a port-Hamiltonian representation [78], providing a potential game-theoretic perspective. This paves the way for further studies and applications to autonomous path planning in marine environments.
In [79], a further different coordination problem is considered. The goal of the UUV swarm is to keep a predefined formation while traveling towards a target in a leaderfollowers fashion. The control is realized by means of a mixed-value logic network, leveraging the semi-tensor product between matrices to cope with the huge amount of data that may be collected during the journey. This work inspired some advances, as reported in [64,80,81]. In particular, [64] is reviewed in Section 5.
Finally, the level-set methods have also been used in this context to maintain the formation of ocean vehicles in dynamic environments [31]. After developing the theory, the authors provide realistic examples of groups of vehicles maintaining the shape of dynamic equilateral triangles, even though the vehicles operate in highly dynamic ocean simulations of the complex Philippines Archipelago. Table 4 classifies the aforementioned contributions, as a function of the solution scheme employed by each of them. Table 4. Approaches to solving coordination games and papers which use them.

Patrolling Games
Modern game-theoretic approaches to maritime patrolling started with the USA Office of Naval Research Technical Report [82]. The authors consider a task to ensure secure transit in an area populated by three different agents: vessels, patrollers, and pirates. To properly model the game, one should consider the interaction among the three classes of players altogether. Since this is a complex problem, the authors suggest that they should be analyzed in pairs and solved iteratively in order to converge to a steady equilibrium. The pairwise game is far simpler than the overall model since it assumes the third player's strategy to be fixed and given a priori; the next lines briefly summarize them. The interaction between vessels and pirates is named by the authors as a transit game, and it is modeled as a zero-sum game where the former seeks for a randomized strategy over feasible startto-end paths in order to minimize the probability of being captured. On the other hand, the pirates are constrained to closed-loop trajectories which need to be optimized in order to maximize the probability of successful attacks without being intercepted by patrollers located according to a given distribution. The transit grouping game is the name given to the cooperative sub-game involving only vessels and patrollers. The goal is to create optimal groups based on their characteristics (e.g., speed) and their preferences (e.g., deadlines for cargo delivery). Finally, the patrolling game is the zero-sum game where the patrollers face pirates seeking for a time-dependent policy that minimizes the maximal probability that some vessel would be left unvisited. The work inspired several subsequent works [83,84] and real-world simulations (Indian Ocean and Gulf of Aden) [85][86][87].
In [88][89][90][91], the authors introduce and illustrate PROTECT-a game-theoretic system deployed by the United States Coast Guard in the port of Boston for scheduling their patrols-in more detail. The system is based on an attacker-defender Stackelberg game model and offers two key innovations: it rejects the assumption of perfect adversary rationality used in previous works relying on a quantal response model [12,13] of the adversary's behavior (which is known to better model human-based decision making processes), and it leverages a compact representation of the defender's strategy space by exploiting equivalence and dominance notions (which makes PROTECT efficient enough to solve real-world-sized problems). Experimental results on real data illustrate that PROTECT's quantal response model more robustly handles real-world uncertainties than a perfect rationality model. In [92], the authors revisit the Stackelberg game model widely adopted for security purposes for the case of non-stationary targets. As an example of mobile targets, the authors suggest the escorting of ferries transiting in dangerous areas or the protection of refugee supply lines. The contribution of the work is fourfold: it proposes a new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker; it implements an efficient linear-programming-based solution that uses a compact representation for the defender's mixed strategy, while accurately modeling the attacker's continuous strategy using a novel sub-interval analysis method; it discusses and analyzes multiple heuristic methods for equilibrium refinement to improve defender's strategy robustness; and it discusses approaches to sample actual defender schedules from the defender's mixed strategy. The detailed experimental analysis of the algorithm in the ferry protection domain supports the work.
Stackelberg games find applications also in naval resource allocation for illegal fishing prevention. In [93], the authors implement a game-theoretic algorithm (ComPASS (Conservative Online Patrol Assistant)) based on repeated Stackelberg games that is able to perform well even in the case of scarce statistical data about the opponents. Its peculiarity is that it combines robust optimization and learning to make use of available data to update its recommendations. The algorithm shows robustness with respect to heterogeneous illegal fishers with bounded rationality when tested on the real environment of Gulf of Mexico. There are two limitations of the proposed approach: full observability of the defender's mixed strategy before each attack and the attacker's lag-free observation procedure. In [94], the authors tackle these unrealistic assumptions, generating better-performing defending strategies by introducing the so-called Green Stackelberg games.
Furthermore, [95] adopts Stackelberg games to implement efficient patrol strategies, this time with the purpose of protecting coral reef ecosystems. The methodology first represents the environment to patrol by constructing a transition graph with a timeline; then, it overcomes the issues resulting from the exponential growth of the defender's pure strategies by proposing a compact reformulation of the mixed strategies and solving the problem by means of a compact linear program [96]. The latter has a number of constraints equal to the number of attacker's strategies, which grows exponentially as well. To overcome this further issue, the authors propose to resort to a compact-strategy double-oracle algorithm on graphs [97]-a procedure which optimizes the original problem by starting from solving a sub-game involving only a very small subset of each player's pure strategies. Then, it expands the players' strategy sets only if a unilateral deviation according to an unconsidered strategy is worthwhile for the deviant. The final solution is provably also an equilibrium to the original game [98]. The exploitation of the underneath graph structure allows one to further speed up the computations. In [99], the authors introduce two further elements to the problem of patrolling a dangerous area with moving targets: projections in time and sub-area criticality. The former endows the attackers and patrollers with the ability to make decisions based not only on the current situation but also on the near-term/midterm expected evolution of the scenario. The second allows one to consider scenarios where some areas are preferable for an attack; e.g., due to the presence of support structures, refuges, etc. Simulations in the Gulf of Aden testify the efficacy of the approach. The case of dynamic targets is also studied in [100] where a two-level Stackelberg repeated game is used to model the patroller-attacker interaction. The authors adopt a Bayesian approach to represent the uncertainty about the opponent's preferences due to the dynamic scenario. However, all the reasoning is developed only considering the current position of the vessels; i.e., it lacks any projection of the future. The temporal component affects the evolution of the game simply because the strategies are periodically recomputed, each time considering a static representation of the scenario. In [101], the authors propose a new model for computing effective patrol strategies in Stackelberg games, showing its efficacy on a naval simulation. It leverages the extraproximal method [102] and its extension to Markov chains, within which the unique Stackelberg/Nash equilibrium of the game is explicitly computed. Following the Kullback-Leibler divergence, the players' actions are fixed and the next-state distribution of the process is computed. The authors provide guarantees of algorithm convergence under a very mild hypothesis on attackers and defenders.
The Stackelberg game-based approaches and applications reviewed so far fall in the broader and recently very fruitful category of Stackelberg Security Games [103,104], which have applications from fighting poaching [105] to auditing companies [106,107] and software testing [108]. Table 5 classifies the aforementioned contributions as a function of the solution scheme employed by each of them. Table 5. Approaches to solving patrolling games and papers which use them.

Opportunities and Way Ahead
Concerning opportunities and the way ahead, an appealing research direction seems to be multi-objective game-theoretic path planning, especially in the case of priorities. For instance, in coordination games, more than one formation of AUVs may be acceptable, but the formation to adopt depends on both environmental feasibility and a strict preference relation. This means that the AUVs organize in accordance to the preferred formation until some external condition change (the maneuver space decreases, the environment becomes hostile, etc.) forces the swarm to adopt the second preferred formation. Another scenario may involve a search planning game where the players have to find more than one type of object-e.g., mines and shipwrecks-which are ordered by priority (the overall task may be to seek for shipwrecks, but the identification of underwater mines is crucial for the safety of both the search and the subsequent immersions). In this case, among all the possible paths that equally investigate the presence of mines, the one to take must maximize the information about the presence of shipwrecks as well. One further game which could benefit from a multi-objective approach (possibly prioritized) is the rendezvous game, where time and energy efficiency can be considered together along with other performance indicators such the complexity of driving, etc. In addition, patrolling games seem to naturally admit multiple objectives; e.g., the supervision of a certain area may concern the safety of set targets, some of which have higher importance than some others. Related examples involve planning the time-optimal missions of marine vehicles that visit a number of locations in highly dynamic ocean currents [109]. In this work, the authors solve realistic naval optimization problems-e.g., fastest inspections of multiple shipwrecks and harbors, as well as the clearance of multiple mines-in the highly complex Philippines Archipelago ocean region.
Recent advances for multi-objective prioritized optimization, whether lexicographic [110][111][112] or Pareto-lexicographic [113][114][115], have developed programming tools and algorithms that have given new life to the study of lexicographic game theory [116,117]. As a consequence, the numerical study and solution of the practical problems mentioned above seem possible, possibly paving the way for a new approach to autonomous marine path planning.

Conclusions
This work reviewed the state of the art of game theory-based and game theoryrelated path planning techniques in marine domains. In doing so, a categorization of maritime tasks as game-theoretic models was first provided; then, for each category, the most relevant contributions were reviewed and discussed. The word relevance refers to either the novelty and importance of the studied scenario, the peculiarity of the technique adopted, or the superiority of the results achieved. Part of the effort was dedicated to providing glimpses of research opportunities and promising results. In particular, the use of multi-objective optimization for multi-task path planning seems to arise quite naturally. Examples include shipwreck searching in mined regions as well as multi-target patrolling and multi-formation autonomous vehicle coordination, to mention only a few applications. Moreover, it seems that the use of advanced numerical schemes may be of significant help in the case of prioritized tasks, as testified by the literature. Indeed, remarkable applications have been found in cars and general aviation aircraft design, lexicographic optimization, and lexicographic game theory. In summary, the interaction between (prioritized) multiobjective game-theoretic path planning and these advanced numerical schemes seems to be quite synergistic and may show fruitful results in the near future.
Author Contributions: The authors contributed equally to this contribution in of its phases. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: