Mean-Payoff Games with ω -Regular Speciﬁcations

: Multi-player mean-payoff games are a natural formalism for modelling the behaviour of concurrent and multi-agent systems with self-interested players. Players in such a game traverse a graph, while attempting to maximise a (mean)payoff function that depends on the play generated. As with all games, the equilibria that could arise may have undesirable properties. However, as system designers, we typically wish to ensure that equilibria in such systems correspond to desirable system behaviours, for example, satisfying certain safety or liveness properties. One natural way to do this would be to specify such desirable properties using temporal logic. Unfortunately, the use of temporal logic speciﬁcations causes game theoretic veriﬁcation problems to have very high computational complexity. To address this issue, we consider ω -regular speciﬁcations. These offer a concise and intuitive way of specifying system behaviours with a comparatively low computational overhead. The main results of this work are characterisation and complexity bounds for the problem of determining if there are equilibria that satisfy a given ω -regular speciﬁcation in a multi-player mean-payoff game in a number of computationally relevant game-theoretic settings.


Introduction
Modelling concurrent and multi-agent systems such as games in which players interact by taking actions in pursuit of their preferences is an increasingly common approach in both formal verification and artificial intelligence [1][2][3]. One widely adopted semantic framework for modelling such systems is that of concurrent game structures [1]. Such structures capture the dynamics of a system-the actions that agents/players can perform, and the effects of these actions. On top of this framework, we can introduce additional structure to represent each player's preferences over the possible paths of the system. Several approaches have been proposed for this purpose. One very natural method involves assigning a weight to every state of the game, and then considering each player's meanpayoff over generated paths: a player prefers paths that maximise their mean-payoff [4][5][6]. These games are effective in modelling resource-bounded reactive systems, as well as any scenario with multiple agents and quantitative features. Under the assumption that each agent in the system is acting rationally, concepts from game theory offer a natural framework for understanding its possible behaviours [7]. This approach is (relatively) computationally tractable [5], expressive enough to capture applications of interest, and has been receiving increasing attention recently [8]. As such, equilibria for multi-player games with mean-payoff objectives have been studied, and the computation of Nash equilibria in such games shown to be NP-complete [5].
However, it is well-known in the game theory literature that equilibria may have undesirable properties. In the context of our setting, for example, an equilibrium may visit dangerous system states, or lead to a deadlock. Thus, one may also want to check if there exist equilibria which satisfy some additional desirable computational properties associated with the game. This decision problem-that is, determining whether a given formal specification is satisfied on some (or every) equilibrium of a given multi-agent system modelled as a multi-player game-is known as Rational Verification [9,10].
Previous approaches to rational verification have borrowed their methodology from temporal logic model checking, appealing to logics such as Linear Temporal Logic (LTL) [11] and Computation Tree Logic (CTL) [12]. However, since rational verification subsumes automated synthesis, the use of temporal logic specifications introduces high computational complexity [13]. To mitigate this problem, one might use fragments of temporal logic with lower complexity (e.g., GR(1) (generalised Reactivity(1)) formulae [14,15]), but in this work we adopt a different approach. Taking inspiration from automata theory, and in particular from [16], we consider system specifications given by a formal language for expressing ωregular specifications, defined in terms of those states in the system that are visited infinitely often. With this approach, the complexity of the main game-theoretic decision problems is considerably lower than is the case with temporal logic specifications.
In this paper, we offer the following main contributions. We begin by introducing a language for ω-regular specifications and demonstrate that they form a natural construct for representing properties of concurrent games. In Section 3, we study multi-player mean-payoff games with ω-regular specifications in the non-cooperative setting [7], and consider the natural decision problems relating to these games and their Nash equilibria. Following this, in Section 4 we consider a cooperative solution concept derived from the core [7,17]. Finally, in Section 5 we look at reactive module games [18] as a way of succinctly representing systems, and investigate how the use of this representation affects our complexity results. We conclude with a discussion of related work in Section 6, before offering a glossary of terms, acronyms and notation used within the paper.

Games
A concurrent game structure [1] is a tuple, M = (Ag, St, s 0 , (Ac i ) i∈Ag , tr), where, • Ag and St are finite, non-empty sets of agents and system states, respectively, where s 0 ∈ St is an initial state; • For each i ∈ Ag, Ac i is a set of actions available to agent i; • tr : St × Ac 1 × · · · × Ac |Ag| → St is a transition function.
We define the size of M to be |St| · |Ac| |Ag| . Concurrent games are played as follows. The game begins in state s 0 , and each player i ∈ Ag simultaneously picks an action ac 0 i ∈ Ac i . The game then transitions to a new state, s 1 = tr(s 0 , ac 0 1 , . . . , ac 0 |Ag| ), and this process repeats. Thus, the nth state visited is s n = tr(s n−1 , ac n−1 1 , . . . , ac n−1 |Ag| ). Since the transition function is deterministic, a play of a game will be an infinite sequence of states, π : N → St. We call such a sequence of states a path. Typically, we index paths with square brackets, i.e., the kth state visited in the path π is denoted π[k], and we also use slice notation to denote prefix, suffixes and fragments of paths. That is, we use π[m..n] to mean π[m]π[m + 1] . . . π[n − 1], π[..n] for π[0]π [1] . . . π[n − 1] and π[m..] for π[m]π[m + 1] . . .. Now, consider a path π. We say that π visits a state s if there is some k ∈ N such that π[k] = s. Since there are only finitely many states, some must be visited infinitely often. Furthermore, unless all states are visited infinitely often, there will also exist some set of states that are visited only finitely often. Thus, given a path π, we can define the following two sets, which one can use to define objectives over paths: Inf(π) = {s ∈ St | π visits s infinitely often} and its complement Fin(π) = St \ Inf(π). and player 1 (respectively, 2) chooses edges to try and maximise (respectively, minimise) the mean-payoff of w, denoted mp( w), where for β ∈ Z ω , we have: There are two keys facts about two-player, mean-payoff games that we shall use without comment throughout. The first is that memoryless strategies suffice for both players to act optimally (i.e., achieve their maximum payoff) [4]. The second is that every game has a value (i.e., a payoff that player 1 can achieve regardless of what player 2 plays) and determining if a game's value is equal to v lies in NP ∩ co-NP [6]. In particular, given a game, determining its value can be seen as a problem that lies within TFNP [19].
Extending two-player, mean-payoff games to multiple players, a multi-player meanpayoff game [5] is given by a tuple, where M is a concurrent game structure and for each agent i ∈ Ag, w i : St → Z is a weight function. In a multi-player mean-payoff game, a path π = s 0 s 1 . . . induces an infinite sequence of weights for each player, w i (s 0 )w i (s 1 ) . . . We denote this sequence by w i (π). Under a given path, π, a player's payoff is given by mp(w i (π)). For notational convenience, we will write pay i (π) for mp(w i (π)). We can then define a preference relation over paths for each player as follows: π i π if and only if pay i (π) ≥ pay i (π ). We also write π i π if π i π and not π i π. Note that, since strategy profiles σ induce unique plays π( σ), we can lift preference relations from plays to strategy profiles, for example writing σ 1 i σ 2 as a shorthand for π( σ 1 ) i π( σ 2 ).
In what follows, we refer to multi-player, mean-payoff games simply as mean-payoff games, and refer to two-player, mean-payoff games explicitly as such.

Solution Concepts
To analyse our games, we make use of solution concepts from both the non-cooperative and cooperative game theory literatures. With respect to non-cooperative solution concepts, a strategy profile σ is said to be a Nash equilibrium [20,21] if for all players i and strategies σ i , we have σ i ( σ −i , σ i ). Informally, a Nash equilibrium is a strategy profile from which no player has any incentive to unilaterally deviate. In addition to Nash equilibrium, we also consider the cooperative solution concept known as the core [17,22]. While Nash equilibria are profiles that are resistant to unilateral deviations, the core consists of profiles that are resistant to those deviations by coalitions of agents, where every member of the coalition is better off, regardless of what the rest of the agents do. Formally, we say that a strategy profile, σ, is in the core if for all coalitions C ⊆ Ag, and strategy vectors σ C , then there is some complementary strategy vector σ Ag\C such that σ i ( σ C , σ Ag\C ), for some i ∈ C. Given a game G, let NE(G) denote the set of Nash equilibrium strategy profiles of G, and let CORE(G) denote the set of strategy profiles in the core of G.
It is worth noting that if a strategy profile is not a Nash equilibrium, then at least one player can deviate and be better off, under the assumption that the remainder of the players do not change their actions. However, if a strategy profile is not in the core, then some coalition can deviate and become better off, regardless of what the other players do. Thus, the core should not be confused with the solution concept of strong Nash equilibrium: a strategy profile that is stable under multilateral deviations, assuming the remainder of the players 'stay put' [23,24]. We will not use strong Nash equilibria in this work, and only mention the concept in order to emphasise how strong Nash equilibria are different from the core.

ω-Regular Specifications
In [16], Boolean combinations of atoms of the form Inf(F) are used to describe acceptance conditions of arbitrary ω-automata. We use this approach to specify system properties for our games. Formally, the language of ω-regular specifications, α, is defined by the following grammar: where F ranges over subsets of St. For notational convenience, we write Fin(F) as shorthand for ¬ Inf(F), Inf(F) for Inf(St \ F) and we define disjunction, · ∨ ·, implication · → · and bi-implication · ↔ · in the usual way. The size of a specification is simply the sum of the sizes of the sets within its atoms. We now talk about what it means for a path to model a specification. Let π be a path, F be a subset of St and α, β be arbitrary ω-regular specifications. Then, Note that we use Inf in two different, but interrelated senses. First, we use it as an operator over paths, as in Inf(π), to denote the set of states visited infinitely often in a path π, but we also use it as an operator over sets, as in Inf(F), as an atom in the specifications just defined. The semantics of the latter are defined in terms of the former. We will use these interchangeably: usage will be clear from the context. Using this notation, we can readily define conventional ω-regular winning conditions, as follows:

Type Associated Sets ω-Regular Specification
Büchi Our ω-regular specifications are equivalent to Emerson-Lei conditions [25], albeit with a different syntax. We can in fact represent all possible ω-regular winning conditions-as another example, consider parity conditions. Suppose each state is labelled by a function, µ : St → N, with µ(s) ≤ m for some m ∈ N, for all s ∈ St. Given a path, π, the traditional parity condition is satisfied when min{µ(s) | s ∈ Inf(π)} is odd. The sets of interest are defined by: Then, assuming m is odd (the formula for m even is similar), the parity condition can be expressed by the following specification: With ω-regular specifications defined, we can talk about them in the context of games. Let σ be some strategy profile. Then, σ induces some path, π( σ), and given that ω-regular specifications are defined on paths, we can talk about paths induced by strategies modelling specifications. However, we are not interested in whether the paths induced by arbitrary strategies model a given specification-it is more natural in the context of multi-player games to ask whether the paths induced by some or all of the equilibria of a game model a specification, both in the non-cooperative and in the cooperative contexts. In particular, we are interested in the Nash equilibria and the core, and whether the paths induced by strategy profiles that form an instance of these solution concepts model a specification. Example 1. Suppose we have four delivery robots in a warehouse (given by the coloured triangles in Figure 1), who want to pick up parcels at the pickup points (labelled by the bold Ps) and drop them off at the delivery points (labelled by the bold Ds). If a robot is not holding a parcel, and goes to a pickup point, it automatically gets given one. If it has a parcel, and goes to the delivery point, then it loses the parcel, and gains a payoff of 1. Furthermore, if two robots collide, by entering the same node at the same time, then they crash, and get a payoff of −999 at every future timestep. Now, there are a number of Nash equilibria here (infinitely many, in fact), but it is easy to see that many of them exhibit undesirable properties. For instance, consider the strategy profile where red and pink go back and forth between the pickup and delivery points, and threaten to crash into, or deadlock, blue and yellow if they move from their starting positions. This is a Nash equilibrium, but is clearly not Pareto optimal-a socially undesirable outcome.
It is easy to identify the most socially desirable outcome-all four robots visiting the pickup and delivery points infinitely often, waiting for the others to pass when they reach bottleneck points. If we call the set containing the two states where robot i visits a pickup point P i and similarly label the set of delivery points D i , we can express this condition concisely with the following ω-regular specification: i∈ [4] Inf(P i ) ∧ Inf(D i ). (1) Thus, we can conclude that there exists some Nash equilibrium which models the above (Generalised Büchi) specification. However, we just did this by inspection. In practice, we would like to ask this question in a more principled way. As such, we will spend the rest of this paper exploring the natural decision problems associated with mean-payoff games with ω-regular specifications.

Mean-Payoff Games with ω-Regular Specifications
Given that we have proposed ω-regular specifications as an alternative to LTL [11] specifications, it is natural to ask how they compare. The connection between them is given by the following statement: Proposition 1. Let G be a game and let α be some ω-regular specification. Then there exists a set of atomic propositions, Φ, a labelling function λ : St → P (Φ), and an LTL formula ϕ such that, for all paths π, we have π |= α if and only if λ(π) |= ϕ.
Proof. Without loss of generality, we may assume that α is written in conjunctive normal form, that is: where each C i,j is an atom of the form Inf(F) or Fin(F) for some subset F ⊆ St. We start by introducing a propositional variable p F for every subset F ⊂ St. Then, for a given state s ∈ St, we define: Then, we simply define: We claim that for all paths π, we have π |= α if and only if λ(π) |= ϕ.
First suppose that π |= α. Thus, by definition, we have for all 1 ≤ i ≤ n that π |= m j=1 C i,j . This in turn implies that there exists some j such that π |= C i,j . If C i,j = Inf(F), then this implies that Inf(π) ∩ F = ∅. Take any s ∈ Inf(π) ∩ F. By definition, we have p F ∈ λ(s) and so we also have λ(π) |= G F p F . However, by construction, this implies Similarly, if C i,j = Fin(F), then we have Inf(π) ∩ F = ∅. Thus, for all s ∈ Inf(π), we have p F ∈ λ(s) and so we have λ(π) |= F G ¬p F . By construction, this implies that λ(π) |= D i,j . Thus, for all i, there exists some j such that λ(π) |= D i,j . This implies that λ(π) |= ϕ.
Conversely, suppose that λ(π) |= ϕ. So for all i, there exists a j such that λ(π) |= D i,j . If D i,j = G F p F , then π visits some state s ∈ F infinitely often. Thus, Inf(π) ∩ F = ∅, so π |= Inf(F). Similarly, if D i,j = F G ¬p F , then π |= Fin(F). Either way, we have π |= C i,j . So for all i, there exists some j such that π |= C i,j , implying that π |= α.
Thus, ω-regular specifications can be seen as being 'isomorphic' to a strict subset of LTL (it is straightforward to come up with LTL formulae that cannot be written as ω-regular specifications-take for instance G ϕ, where ϕ is some non-trivial propositional formula). As such, we hope the restriction of the setting may yield some lower complexities when considering the analogous decision problems. That is, we will study a number of decision problems within the rational verification framework [10,17], where ω-regular specifications replace LTL specifications in a very natural way.
Firstly, given a game, a strategy profile, and an ω-regular specification, we can ask if the strategy profile is an equilibrium whose induced path models the specification. Secondly, given a game and an ω-regular specification, we can ask if the specification is modelled by the path/paths induced by some/every strategy profile in the set of equilibria of the game. Each of these problems can be phrased in the context of a non-cooperative game or a cooperative game, depending on whether we let the set of equilibria be, respectively, the Nash equilibria or the core of the game. Formally, in the non-cooperative case, we have the following decision problems:
A natural dual to E-NASH is the A-NASH problem, which instead of asking if the specification holds in the path induced by some Nash equilibrium, asks if the specification holds in all equilibria: A-NASH: Given: Game G and specification α. Question: Is it the case that π( σ) |= α, for all σ ∈ NE(G)?
These decision problems were first studied in the context of iterated Boolean games [26], and are the 'flagship' decision problems of rational verification [18].
In the cooperative setting, the analogous decision problems are defined by substituting CORE(G) for NE(G). We refer to these problems as MEMBERSHIP, E-CORE, and A-CORE, respectively, (with a small abuse of notation for the first problem: context will make it clear whether we are referring to the non-cooperative or cooperative problem). These variants were first studied in the setting of LTL games [17].
It is worth noting here one technical detail about representations. In the E-NASH problem, the quantifier asks if there exists a Nash equilibrium which models the specification. This quantification ranges over all possible Nash equilibria and the strategies may be arbitrary strategies. However, in the MEMBERSHIP problem, the strategy σ is part of the input, and thus, needs to be finitely representable. Therefore, when considering E-NASH (or A-NASH, or the corresponding problems for the core), we place no restrictions on the strategies, but when reasoning about MEMBERSHIP, we work exclusively with memoryless or finite-memory strategies.
Before we proceed to studying all these problems in detail, we note that even though some other types of games (for example, two-player, turn-based, zero-sum mean-payoff, or parity games) can be solved only using memoryless strategies, this is not the case in our setting: Proposition 2. There exist games G and ω-regular specifications α such that π( σ) |= α for some Nash equilibrium σ, but for which there exists no memoryless Nash equilibrium σ such that π( σ) |= α. The statement also holds true for the core.  In this game, each player decides whether they want to go left or right. When they both agree on what direction they want to go, they move in that direction. If they disagree, then they end up in the middle state. We now produce a finite memory strategy profile, σ, and a specification α such that σ is a Nash equilibrium and π( σ) |= α. The basic idea is that with finite memory strategies, the two players can alternate between the left and right state, each achieving a strictly positive payoff, threatening to punish one another if either deviates from this arrangement, whilst this is not possible in memoryless strategies, as in this case the players would have to take the same action each time in the middle state.
Consider the following strategy, σ 1 , for player 1 (   It is easy to verify that σ = (σ 1 , σ 2 ) is a Nash equilibrium. Each player gets a mean payoff of 1/4 under σ. Suppose player 1 deviates to another strategy, which does not match the sequence of actions as dictated by σ 2 . Then player 2 will just start playing L forever, meaning that the game will never enter the right state again, implying player 1 gets a payoff of zero. By symmetry, we can conclude that this is a Nash equilibrium. Moreover, letting α be the ω-regular specification α = Inf({right}) ∧ Inf({left}), we see that π( σ) |= α.
However, note that there cannot exist a memoryless Nash equilibrium, σ, such that π( σ) |= α. In a memoryless strategy, for a given state, both players must commit to an action and only play that given action vector in that state. If both players play R in the middle state, then they will never be able to reach the left state, and if they both play L in the middle state, they will never reach the right state. Additionally, if they disagree, then they will perpetually stay in the middle state. Thus, there cannot exist a memoryless strategy profile σ, such that π( σ) |= α. This example demonstrates that, in general, memoryless strategies are not powerful enough for our games: there may exist a Nash equilibrium which satisfies the specification, with no memoryless Nash equilibrium which satisfies it. Finally, note that σ is also in the core-individual deviations have been already accounted for, and clearly no group deviation containing both players strictly improves both pay-offs.

Non-Cooperative Games
In the non-cooperative setting, MEMBERSHIP, E-NASH, and A-NASH are the relevant decision problems. In this section, we will show that MEMBERSHIP lies in P for memoryless strategies, while E-NASH is NP-complete and even remains NP-hard when restricted to memoryless strategies-thus, no worse than solving a multi-player mean-payoff game [5]. Because A-NASH is the dual problem of E-NASH, it follows that A-NASH is co-NP-complete. In order to obtain some of these results, we also provide a semantic characterisation of the paths associated with strategy profiles in the set of Nash equilibria that satisfy a given ω-regular specification. We will first study the MEMBERSHIP problem, and then investigate E-NASH, providing an upper bound for arbitrary strategies and a lower bound for memoryless strategies-arguably, the simplest, yet still computationally important, model of strategies one may want to consider for multi-agent systems. Theorem 1. For memoryless strategies, MEMBERSHIP is in P.
Proof. We verify that a given strategy profile is a Nash equilibrium in the following way. We begin by calculating the payoff of each player in the strategy profile by 'running' the strategy and keeping note of the game states. When we encounter the same game state twice, we can simply take the average of the weights encountered at the states between the two occurrences, and that will be the payoff for a given player. By the pidgeonhole principle, it will take no more than |St| + 1 time steps to get to this point, and thus, this can be done in linear time.
Once we have each player's payoff, we can determine if they have any incentive to deviate in the following way-for each player, look at the graph induced by the strategy profile excluding them. Formally, the graph induced by a partial strategy profile σ −i , denoted by G[ σ −i ] = (V, E) is defined as follows: the set of nodes, V, simply consists of the set of states of the game, that is, V = St and the set of edges are simply the moves available to player i-that is, if e = (s 1 , s 2 ) ∈ E, then there exists some ac i ∈ Ac i such that tr(s 1 , ( σ −i (s 1 ), ac i )) = s 2 .
We can use Karp's algorithm [27] to determine the maximum payoff that this player can achieve given the other players' strategies. If this payoff is higher than the payoff originally achieved, then this means player i can successfully deviate from the given strategy profile and be better off, and so it is not a Nash equilibrium. If we do this for each player, and the maximum payoff each player can achieve to equal to their payoff in the given strategy profile, then we can conclude it is a Nash equilibrium, and moreover, we have determined this in polynomial time.
To determine if the strategy profile satisfies the specification, we run the strategy as before, and determine the periodic path which the game will end up in. This tells us which states will be finitely and infinitely visited, which in turn induces a valuation which will either model or not model the specification. Checking this can be done in polynomial time and thus, we can conclude that for memoryless strategies, MEMBERSHIP lies in P.
The simplicity of the above algorithm may raises hopes that it might extend to finitememory strategies. However, in this case, the configuration of the game is not just given by the current state-it is given by the current state, as well as the state that each of the player's strategies are in. Thus, we might have to visit at least |St| · |Q| |Ag| + 1 (where Q is the smallest set of strategy states over the set of players) configurations until we discover a loop. Now whilst there is an exponential dependency on the number of players in the input to the problem, this bound on the number of configurations is not necessarily polynomial in the size of the input. More precisely, the size of the underlying concurrent game structure is |St| · |Ac| |Ag| and so if |Q| is larger that |Ac|, the number of configurations will grow exponentially faster than the size of the input. Thus, we cannot use the above algorithm in the case of finite memory strategies to get a polynomial time upper bound.
We now consider the E-NASH problem. Instead of providing the full NP-completeness result here, we start by showing that the problem is NP-hard, even for memoryless strate-gies, and delay the proof of the upper bound until we develop a useful characterisation of Nash equilibrium in the context of ω-regular specifications. For now, we have the following hardness result, obtained using a reduction from the Hamiltonian cycle problem [28,29]-a similar, but simpler, argument can be found in [5].
Proposition 3. E-NASH is NP-hard, even for games with one player, constant weights, and memoryless strategies.
Proof. Let G = (V, E) be a graph with |V| = n. We form a mean-payoff game G by letting Ag = {1} and St = V. We pick the initial state arbitrary and label it s 0 , and the actions for player 1 correspond to the edges of G. That is, we have Ac 1 = E and we have tr(u, e) = v if and only if e = (u, v) ∈ E. Finally, we fix an integer constant k ∈ Z and let w 1 (s) = k for all s ∈ St.
Let F s = {s} for each s ∈ St and let α be the following specification: We claim that G has a Hamiltonian cycle if and only if G has a memoryless Nash equilibrium σ such that π( σ) |= α.
First suppose that G has a Hamiltonian cycle, π = v 0 v 1 . . . v n−1 . We define a memoryless strategy, σ 1 , for player 1 by setting σ 1 (v i ) = v i+1 , where the superscript is interpreted modulo n. As π is a Hamiltonian cycle, it visits every node, so σ 1 is a well-defined, total function. By definition, we see that π(σ 1 ) = π, so we have π(σ) |= α. Additionally, as each state has the same payoff, this strategy is trivially a Nash equilibrium. Now suppose that G has a memoryless Nash equilibrium σ 1 such that π(σ 1 ) |= α. Let π(σ 1 ) = π. Since σ 1 is memoryless, π must be of the form π = π[..i]π[i..j] ω for integers i and j. Without loss of generality, assume that i and j are the smallest integers such that this holds. Moreover, since π |= α, the path π visits every state infinitely often so we must have that π[i..j] contains every state, and by memorylessness, it must be a cycle. Thus, by definition, π[i..j] is a Hamiltonian cycle. Theorem 1 and Proposition 3 together establish NP-completeness for multi-player mean-payoff games with ω-regular specifications and memoryless strategies: one can non-deterministically guess a memoryless strategy for each player (which is simply a list of actions for each player, one for each state), and use MEMBERSHIP to verify that it is indeed a Nash equilibrium that models the specification. However, as we shall show later, the problem is also NP-complete in the general case. To prove this, we need to develop an alternative characterisation of Nash equilibria.
To do this, we need to introduce the notion of the punishment value in a multi-player mean-payoff game; cf., [30,31]. The punishment value, pun i (s), of a player i in a given state s can be thought of as the worst value the other players can impose on a player at a given state. Concretely, if we regard the game G as a two player, zero-sum game, where player i plays against the coalition Ag \ {i}, then the punishment value for player i is the smallest mean-payoff value that the rest of players in Ag can inflict on i from a given state. Formally, given a player i and a state s ∈ St, we define the punishment value, pun i (s) against player i at state s, as follows: How efficiently can we calculate this value? As established in [5], we proceed in the following way: in a two player, turn-based, zero-sum, mean-payoff game, positional strategies suffice to achieve the punishment value [4]. Thus, we can non-deterministically guess a pair of positional strategies for each player (one for the coalition punishing the player, and one for the player themselves), use Karp's algorithm [27] to find the maximum payoff for both the player and the coalition against their respective punishing strategies, and then verify that the two values coincide. With this established, we have the following lemma, which can be proved using techniques for mean-payoff games adapted from [5,15].

Lemma 1.
Let π be a path in G and let { ac[k]} k∈N be the path of associated action profiles. Then there is a Nash equilibrium, σ ∈ NE(G), such that π = π( σ) if and only if there exists some z ∈ Q Ag , with z i ∈ {pun i (s) | s ∈ St}, such that: • for each k, we have pun i (tr(π[k], ( ac[k] −i , ac i ))) ≤ z i for all i ∈ Ag and ac i ∈ Ac i , and; • for all players i ∈ Ag, we have z i ≤ pay i (π).
Proof. First assume that we have some Nash equilibrium σ ∈ NE(G) such that π = π( σ). Suppose there does not exist any z ∈ Q Ag with the desired properties. Furthermore, let us first suppose that for all z ∈ Q Ag , with z i ∈ {pun i (s) | s ∈ St}, there exists some player i ∈ Ag such that z i > mp(w i (π)). However, this is true for all ). This means that player i might as well play the positional strategy which ensures they achieve at least the punishment value. This in turn means that player i has some deviation, contradicting the fact that σ is a Nash equilibrium. So instead, it must be the case that for all z ∈ Q Ag , with z i ∈ {pun i (s) | s ∈ St}, there is some time step k, a player i and an action ac i such that Thus, it follows that the first part of the statement is true. Now assume that there exists some z ∈ Q Ag with the properties as prescribed in the statement of the lemma. We define a strategy profile σ in the following way. Each player follows π. If any player chooses to deviate from π, say at state π[k], with an action ac i then the remaining players play the punishing strategy which causes player i to have a payoff of at most pun i (tr(π[k], ac[k] −i , ac i )) ≤ z i ≤ pay i (π). Thus, no player has any incentive to deviate away from σ and so we have a Nash equilibrium with π( σ) = π.
With this lemma in mind, we define a graph, G[ z; F] = (V, E) as follows. We set V = St and include e = (u, v) ∈ E if there exists some action profile ac such that v = tr(u, ac) with pun i (tr(u, ( ac −i , ac i ))) ≤ z i for all i ∈ Ag and ac i ∈ Ac i . Having done this, we then prune any components which cannot be reached from the start state and then remove all states and edges not contained in F, before reintroducing any states in F that may have been removed. Thus, given this definition and the preceding lemma, to determine if there exists a Nash equilibrium which satisfies an ω-regular specification, α, we calculate the punishment values, and guess a vector z s ∈ St Ag , as well a set of states, F, which satisfy the specification. Letting z i = pun i (z s ), we form the graph G[ z; F] and then check if there is some path π in G[ z; F] with z i ≤ pay i (π) for each player i which visits every state infinitely often. Trivially, if this graph is not strongly connected, then no path can visit every state infinitely often. Thus, to determine if the above condition holds, we need one more piece of technical machinery, in the form of the following proposition: Proposition 4. Let G = (V, E) be a strongly connected graph, let {w i } i∈Ag be a set of weight functions, and let z ∈ Q Ag . Then, we can determine if there is some path π with the properties, • π visits every state infinitely often; Conceptually, Proposition 4 is similar to Theorem 18 of [5], but with two key differences-firstly, we need to do additional work to determine if there is a path that visits every state infinitely often. Moreover, the argument of [5] is adapted so we have the corollary that if there is a Nash equilibrium that models the specification, then there is some finite state Nash equilibrium that also models the specification. This means that the construction in our proof can be used not only for verification, but also for synthesis.
For clarity of presentation, we shall split the above proposition into two constituent lemmas. To do this, we begin by defining a system of linear inequalities. We then go on to show that there is a path π with the desired properties if and only if this system has a solution-one lemma for each direction. As the system of inequalities can be determined in polynomial time, this yields our result.
For a graph G, a set of weight functions {w i } i∈Ag be a set of weight functions, and a vector z ∈ Q Ag , we define a system of linear inequalities, (G, {w i } i∈Ag , z) as follows: for each agent i ∈ Ag, and each edge e ∈ E, introduce a variable x i,e , along with the following inequalities: (i) x i,e ≥ 0 for each agent i ∈ Ag and for each edge e ∈ E.
It is worth briefly discussing what this system is actually encoding. Roughly speaking, the set C i = {x i,e } e∈E defines a cycle for each player that makes sure their payoff is greater than z i . Each x i,e represents the proportion that a given edge is visited in the cycle. The idea is that we define a path by visiting each C i an appropriate number of times, before travelling to the next cycle and visiting that repeatedly.
Conversely, if there exists some path with the stated properties, it will also define a solution to the system of inequalities. This has the corollary that if there is a Nash equilibrium in the game, then there is a finite state Nash equilibrium as well.
In what follows, for an edge e = (u, v) and a weight function w : V → Z, we define w(e) := w(u). We also extend weight functions to finite paths in the natural way, by summing along them.

Lemma 2.
Let G = (V, E) be a strongly connected graph, let {w i } i∈Ag be a set of weight functions, let z ∈ Q Ag . Furthermore, suppose there exists some path π such that z i ≤ pay i (π) for each i ∈ Ag, which visits every state infinitely often. Then (G, {w i } i∈Ag , z) has a solution.
Proof. First suppose there exists some path π with the stated properties. For each n > 0 and e ∈ E, define λ(n, e) to be the following quantity: Informally, for a given edge e, λ(n, e) gives us the proportion that e appears in the prefix π[..n]. Note that 0 ≤ λ(n, e) ≤ 1 for all e ∈ E and all n > 0. Additionally, for each n, it easy to see we have: By definition, for each i ∈ Ag we have: Thus, there must exist some subsequence of the natural numbers, {n i t } t∈N , such that: With this defined, we introduce a sequence of numbers, {ϕ i t (e)} t∈N for each edge e ∈ E, by defining ϕ i t (e) = λ(n i t , e). Now, as {ϕ i t (e)} t∈N is a bounded sequence, by the Bolzano-Weierstrass theorem, there must be a convergent subsequence for each edge e, {ψ i t (e)} t∈N . We define x * i,e = lim t→∞ ψ i t (e) and claim that these form a solution to the system of inequalities. Since Thus, Inequality (i) is satisfied. Additionally, by Equation (2), we can deduce that for each t, and for every i ∈ Ag, we have: Taking limits, we can conclude that Inequality (ii) is satisfied. To establish (iii), by definition of λ, fix a v ∈ V. In a path, if we enter a node, we must exit it. Thus, we have: Taking the relevant subsequence and letting t → ∞, we can deduce that Inequality (iii) is satisfied. To establish (iv), first note that for all i ∈ Ag, we have, The equality between lines 4 and 5 is valid, as we have established that the limit exists in line 2. Thus, since we have z i ≤ pay i (π), by assumption, we can conclude that Inequality (iv) is valid. In a similar fashion, note that for all i, j ∈ Ag, we have, This, together with Inequality (iv), implies that Inequality (v) is satisfied. Thus, putting all this together, we can conclude that if there exists some path with the stated properties, then there also exists a solution to (G, {w i } i∈Ag , z), the system of linear inequalities.
We now show the other direction: Lemma 3. Let G = (V, E) be a strongly connected graph, let {w i } i∈Ag be a set of weight functions, let z ∈ Q Ag . Suppose that (G, {w i } i∈Ag , z) has a solution. Then there exists some path π such that z i ≤ pay i (π) for each i ∈ Ag, which visits every state infinitely often.

Proof.
Given that there is a solution to (G, {w i } i∈Ag , z), there must exist a solution that consists of rational numbers. Thus, letting x * i,e be a rational solution, we can write x * i,e = p i,e /q, for some p i,e ∈ N, and some appropriately chosen q ∈ N. Now, for each i ∈ Ag, we form a multigraph, G i = (V i , E i ), which takes G and replaces each edge e by p i,e copies. Note that whilst G is strongly connected, some of the p i,e may be equal to 0, which would mean that G i is disconnected. Now, by Inequality (iii), for each v ∈ V, we have: Thus, the in-degree of each vertex in V i is equal to its out-degree. Thus, each strongly connected component of G i contains an Eulerian cycle (An Eulerian cycle is a path that starts and ends at the same node which visits every edge exactly once). Interpreting each of these Eulerian cycles as paths in G, we get a set of (not necessarily simple) cycles for each agent, where m i ≤ |V|. Now for each agent i ∈ Ag, and every n ∈ N, we define a cycle C n i which starts at the first state of C 1 i , traces C 1 i n times, takes the shortest path from the start state of C 1 i to C 2 i , traces C 2 i n times and repeats this process for all the cycles of agent i, until it has traced C m i i n times. From here, it then takes the shortest path to the start state of C 1 i . Let M i be the largest of the absolute values of the weights of agent i. That is, Putting these together, we get: Taking limits, we see that: Thus, for n * i sufficiently large, we must also have, for all j ∈ Ag: Moreover, if n * i is large enough, the above equation will be true for all n ≥ n * i . Assuming this is the case, we set n * = max i∈Ag n * i . Then, for all i, j ∈ Ag, we have, We are now ready to define a path π N , which will form the basis for our path of interest, π. The path starts by visiting C n * 1 N times, before taking the shortest path to the start of C n * 2 , and visiting that N times, and so on. Once C n * |Ag| has been visited N times, we then take the shortest path that visits any states that have not been yet visited, before returning to the start vertex of C n * 1 . Let us look at each player's payoff on π N . We have: Taking limits as before, we can conclude that for N * sufficiently large, we have: We then simply define π to be the path that repeatedly traverses π N * . It is easy to see that pay i (π) ≥ z i and that π visits every state infinitely often.
We can now combine these two results to obtain our proof: Proof. (Proof of Proposition 4) Given the statement of the proposition, we construct an algorithm which forms the linear program (G, {w i } i∈Ag , z) and see if it has a solution. As the linear program is of polynomial size, and as linear programs can be solved in polynomial time, we see that the overall algorithm can be done in polynomial time. By Lemma 3, we see that the algorithm is sound and then by Lemma 2, we see it is complete.
From the propositions above, we can establish the complexity of E-NASH: Proof. For NP-hardness we have Proposition 3. For the upper bound, suppose we have an instance, (G, α), of the problem. Then we proceed as follows. We non-deterministically guess pairs of punishing strategy profiles, (ζ i , ζ −i ) for each player i ∈ Ag, a state z s for each player, and a set of states F. From these, we can easily check that the valuation induced by F satisfies the specification and we can also use Karp's algorithm to compute the punishment values, pun i (s), for each state s ∈ St and for each player i ∈ Ag. Setting z i = pun i (z s ), we invoke Lemma 1 and form the graph G[ z; F]. If it is not strongly connected, then we reject. Otherwise, we use Proposition 4 to determine if the associated linear program has a solution. If it does, then we accept, otherwise we reject.
Another benefit of splitting Proposition 4 is that it readily yields the following corollary: Corollary 1. Let G be a game and α an ω-regular specification. Suppose that G has some Nash equilibrium σ such that σ |= α. Then, G also has some finite-memory Nash equilibrium σ such that σ |= α.
Proof. Suppose G has some Nash equilibrium σ such that σ |= α. Furthermore, let π = π( σ). Then by Lemma 1, there exists some z ∈ Q Ag , with z i ∈ {pun i (s) | s ∈ St}, and a subset F ⊆ St such that in G[ z, F], π visits every state infinite often and for each i ∈ Ag, we have z i ≤ pay i (π). By Lemma 2, we see that the system of inequalities (G, {w i } i∈Ag , z) has a solution. However, then by Lemma 3, we see that there exists some periodic path π such that in G[ z, F], again, we have that π visits every state infinite often and for each i ∈ Ag, we have z i ≤ pay i (π ). Thus, applying Lemma 1 again, we see that G has some Nash equilibrium σ such that σ |= α with π = π( σ ). Moreover, by the construction in Lemma 3, we see that we may assume that σ is a finite-memory strategy.

Cooperative Games
We now move on to investigate cooperative solution concepts, and in particular, the core. We start by studying the relationship between Nash equilibrium and the core in our context. We first establish that they are indeed different: Thus, the two concepts do not coincide beyond the one-player case. In fact, there are two-player games in which the set of Nash equilibria is empty, while the core is not, which demonstrates that the core is not a refinement of Nash equilibrium. Nor does the other direction hold: Nash equilibrium is not a refinement of the core. The following two games ( Figures 5 and 6) serve as witnesses to these claims.  . The asterisks * are wildcards-they match any action which isn't explicitly detailed on the diagram.
As we can see in the game in Figure 5, G 1 has a Nash equilibrium that is not in the core (and in which both players get a mean-payoff of 0-cf., Player 1 choosing D at the initial state while Player 2 chooses D at the middle state), since the coalition containing both players has a beneficial deviation in which both players get a mean-payoff of 1. On the other hand, in the game in Figure 6, G 2 has a non-empty core (consider every possible memoryless strategy) while it has an empty set of Nash equilibria. Moreover, in both games, the detailed strategies can be implemented without memory.
Regarding memory requirements, as with Nash equilibrium, it may be that, in general, memoryless strategies are not enough to implement all equilibria in the cooperative setting. Indeed, there are games with a non-empty core in which no strategy profile in the core can be implemented in memoryless strategies only. Take, for instance, the game shown in Figure 2, with the weights changed to w i (mid) = 1, w 1 (right) = 6 = w 2 (left). Only if the two players collaborate, and alternate between left and right, they will get their optimal mean-payoffs (of 2). Clearly, such an optimal payoff for both players cannot be obtained using only memoryless strategies.
Another important (game-theoretic) question about cooperative games is whether they have a non-empty core, a property that holds for games with LTL goals and specifications [17]. However, that is not the case for mean-payoff games, at least for games with |Ag| > 2.

Proposition 6.
In mean-payoff games, if |Ag| ≤ 2, then the core is non-empty. For |Ag| > 2, there exist games with an empty core.
Proof. If |Ag| = 1, because of Proposition 5, the core coincides with the set of Nash equilibria in one-player games, which is always non-empty. For two-player games, let σ = (σ 1 , σ 2 ) be any strategy profile. If σ is not in the core, then either Player 1, or Player 2, or the coalition consisting of both players has a beneficial deviation. If the latter is true, then there is a strategy profile, σ = (σ 1 , σ 2 ) such that σ i σ for both i ∈ {1, 2}. We repeat this process until the coalition of both players does not have a beneficial deviation. This must eventually be the case as each player's payoff is capped by their maximum weight, so either they both reach their corresponding maximum weight, or there comes a point when they cannot beneficially deviate together. At this point, we must either be in the core, or either player 1 or player 2 has a beneficial deviation. If player j (j ∈ {1, 2}) has a beneficial deviation, say σ j , then any strategy profile (σ j , σ i ), with i = j, that maximises Player i's mean-payoff is in the core. Thus, for every two-player game, there exists some strategy profile that lies in the core.
However, for three-player mean-payoff games, in general, the core of a game may be empty. Consider the following three-player game G, where each player has two actions, H, T, and there are four states, P, R, B, Y. The states are weighted for each player as follows: If the game is in any state other than P, then no matter what set of actions is taken, the game will remain in that state. Thus, we only specify the transitions for the state P: Note that strategies are characterised by the state that the game eventually ends up in. If the players stay in P forever, then they can all collectively change strategy to move to one of R, B, Y, and each get a better payoff. Now, if the game ends up in R, then players 2 and 3 can deviate by playing (T, H), and no matter what player 1 plays, they will be in state B, and will be better off. However, similarly, if the game is in B, then players 1 and 3 can deviate by playing (T, T) to enter state Y, in which they both will be better off, regardless of what player 2 does. Furthermore, finally, if in Y, then players 1 and 2 can deviate by playing (H, H) to enter R and will be better off regardless of what player 3 plays. Thus, no strategy profile lies in the core.
Before proceeding, it is worth reflecting on the definition of the core. We can redefine this solution concept in the language of 'beneficial deviations'. That is, we say that given a game G, a strategy profile σ, a beneficial deviation by a coalition C, is a strategy vector σ C such that for all complementary strategy profiles σ Ag\C , we have π( σ C , σ Ag\C ) i π( σ) for all i ∈ C. We can then say that σ is a member of the core, if there exists no coalition C which has a beneficial deviation from σ. Note this formulation is entirely identical to our earlier definition of the core.
From a computational perspective, there is an immediate concern here-given a potential beneficial deviation, how can we verify that it is preferable to the status quo under all possible counter-responses? Given that strategies can be arbitrary mathematical functions, how can we reason about that universal quantification effectively? Fortunately, as we show in the following lemma, we can restrict our attention to memoryless strategies when thinking about potential counter-responses to players' deviations: Lemma 4. Let G be a game, C ⊆ Ag be a coalition and σ be a strategy profile. Further suppose that σ C is a strategy vector such that for all memoryless strategy vectors σ Ag\C , we have: Then, for all strategy vectors, σ Ag\C , not necessarily memoryless, we have: Before we prove this, we need to introduce an auxiliary concept of two-player, turn-based, zero-sum, multi-mean-payoff games [32] (we will just call these multi-mean-payoff games moving forward). Informally, these are similar to two-player, turn-based, zero-sum meanpayoff games, except player 1 has k weight functions associated with the edges, and they are trying to ensure the resulting k-vector of mean-payoffs is component-wise greater than a vector threshold. Formally, a multi-mean-payoff game is a 5-tuple, G = (V 1 , V 2 , v 0 , E, w, z k ), where V 1 , V 2 are sets of states controlled by players 1 and 2, respectively, with V = V 1 ∪ V 2 the state space, v 0 ∈ V the start state, E ⊆ V × V a set of edges and w : E → Z k a weight function, assigning to each edge a vector of weights.
The game is played by starting in the start state, s 0 ∈ S i , and player i choosing an edge (s 0 , s 1 ), and traversing it to the next state. From this new state, s 1 ∈ S j , player j chooses an edge and so on, repeating this process forever. Paths are defined in the usual way and the payoff of a path π, pay(π), is simply the vector (mp(w 1 (π)), . . . , mp(w k (π))). Finally, z k ∈ Q k is a threshold vector and player 1 wins if the pay i (π) ≥ z i for all i ∈ {1, . . . , k}, and loses otherwise. An important question associated with these games is whether player 1 can force a win. As shown in [32], this problem is co-NP-complete. Whilst we do not need to utilise this result right now, this sets us up to prove Lemma 4: Proof. (Proof of Lemma 4) Let σ Ag\C be an arbitrary strategy and let i ∈ C be an arbitrary agent. Denote π( σ) by π and π( σ C , σ Ag\C ) by π . We aim to show that π i π. Suppose instead it is the case that π i π . Thus, we have π( σ) i π( σ C , σ Ag\C ). Considering this as a two-player multi-mean-payoff game, where player 1's strategy is fixed and encoded into the game structure (i.e., player 1 follows σ C , but has no say in the matter), and the payoff threshold is mp(π( σ)), then σ Ag\C is a winning strategy for player 2 in this game. Now, by [32,33], if player 2 has a winning strategy, then they have a memoryless winning strategy. Thus, there is a memoryless strategy σ Ag\C such that π( σ) i π( σ C , σ Ag\C ). However, this contradicts the assumptions of the lemma, and thus we must have π i π (In [32], their winning condition relates to whether a player's payoff is greater than or equal to a given vector. One can adapt this argument to show it is also true for strict inequalities).
We now look at some complexity bounds for mean-payoff games in the cooperative setting. Having introduced beneficial deviations, let us consider the following decision problem:

BENEFICIAL-DEVIATION (BEN-DEV):
Given: Game G and strategy profile σ. Question: Is there C ⊆ Ag and σ C ∈ Σ C such that for all σ Ag\C ∈ Σ Ag\C and for all i ∈ C, we have: π( σ C , σ Ag\C ) i π( σ)?
Using this new problem, we can prove the following statement: Proposition 7. Let G be a game, σ a strategy profile, and α a specification. Then, (G, σ, α) ∈ MEMBERSHIP if and only if (G, σ) ∈ BEN-DEV and π( σ) |= α.
Proof. Proof follows directly from definitions.
The above proposition characterises the MEMBERSHIP problem for cooperative games in terms of beneficial deviations, and, in turn, provides a direct way to study its complexity. In the remainder of this section we concentrate on the memoryless case. Proof. First correctly guess a deviating coalition C and a strategy profile σ C for such a coalition of players. Then, use the following three-step algorithm. First, compute the meanpayoffs that players in C get on π( σ), that is, a set of values z * j = pay j (π( σ)) for every j ∈ C-this can be done in polynomial time simply by 'running' the strategy profile σ. Then compute the graph G[ σ C ], which contains all possible behaviours (i.e., strategy profiles) for Ag \ C with respect to σ-this construction is similar to the one used in the proof of Theorem 1, that is, the game when we fix σ C , and can be done in polynomial time. Finally, we ask whether every path π in G[ σ C ] satisfies pay j (π) > z * j , for every j ∈ C-for this step, we can use Karp's algorithm to answer the question in polynomial time for every j ∈ C. If every path in G[ σ C ] has this property, then we accept; otherwise, we reject.
For hardness, we use a small variation of the construction presented in [34]. Let P = {x 1 , . . . , x n } be a set of atomic propositions. From a Boolean formula ϕ = 1≤c≤m C c (in conjunctive normal form) over P-where each C c = l c1 ∨ l c2 ∨ l c3 , and each literal l ck = x j or ¬x j , with 1 ≤ k ≤ 3, for some 1 ≤ j ≤ n-we construct M = (Ag, St, s 0 , (Ac i ) i∈Ag , tr), an m-player concurrent game structure defined as follows: Ac i = {t, f }, for every i ∈ Ag, and Ac = Ac 1 × · · · × Ac m • For tr, refer to the figure below, such that T = {(t 1 , . . . , t m )} and F = Ac \ T. The concurrent game structure so generated is illustrated in Figure 8. With M at hand, we build a mean-payoff game using the following weight function: Then, we consider the game G over M and any strategy profile (in memoryless strategies) such that σ(s 0 ) = y * . For any of such strategy profiles the mean-payoff of every player is 0. However, if ϕ is satisfiable, then there is a path in M, from y 0 to y n , such that in such a path, for every player, there is a state in which its payoff is not 0. Thus, the grand coalition Ag has an incentive to deviate since traversing that path infinitely often will give each player a mean-payoff strictly greater than 0. Observe two things. Firstly, that only if the grand coalition Ag agrees, the game can visit y 0 after y 0 . Otherwise, the game will necessarily end up in y * forever after. Secondly, because we are considering memoryless strategies, the path from y 0 to y n followed at the beginning is the same path that will be followed thereafter, infinitely often. Then, we can conclude that there is a beneficial deviation (necessarily for Ag) if and only if ϕ is satisfiable, as otherwise at least one of the players in the game will not have an incentive to deviate (because its mean-payoff would continue to be 0). Then, formally, we can conclude that (G, σ) ∈ BEN-DEV if and only if ϕ is satisfiable.
From Theorem 3 follows that checking if no coalition of players has a beneficial deviation with respect to a given strategy profile is a co-NP problem. More importantly, it also follows that MEMBERSHIP is also co-NP-complete.

Theorem 4. For memoryless strategies, MEMBERSHIP is co-NP-complete.
Proof. Recall that given a game G, a strategy profile σ, and an ω-regular specification α, we have (G, σ, α) ∈ MEMBERSHIP if and only if (G, σ) ∈ BEN-DEV and π( σ) |= α. Thus, we can solve MEMBERSHIP simply by first checking π( σ) |= α, which can be done in polynomial time and we reject if that check fails. If π( σ) |= α, then we ask (G, σ) ∈ BEN-DEV and accept if that check fails, and reject otherwise. Finally, since BEN-DEV is NP-hard, it follows from the above procedure that MEMBERSHIP is co-NP-hard, which concludes the proof of the statement.
BEN-DEV can also be used to solve E-CORE in this case.

Theorem 5. For memoryless strategies, E-CORE is in
Proof. Given any instance (G, α), we guess a strategy profile σ and check that π( σ) |= α and that (G, σ, α) is not an instance of BEN-DEV. While the former can be done in polynomial time, the latter can be solved in co-NP using an oracle for BEN-DEV. Thus, we have a procedure that runs in NP co-NP = NP NP = Σ P 2 .
From Theorem 5 follows that A-CORE is in Π P 2 and, more importantly, that even checking that the core has any memoryless solutions (but not necessarily that it is empty) is also in Σ P 2 . This result sharply contrasts with that for Nash equilibrium where the same problem lies in NP. More importantly, the result also shows that the (complexity) dependence on the type of coalitional deviation is only weak, in the sense that different types of beneficial deviations may be considered within the same complexity class, as long as such deviations can be checked with an NP or co-NP oracle. For instance, in [17] other types of cooperative solution concepts are defined, which differ from the one in this paper (known in the cooperative game theory literature as α-core [7]) simply in the type of beneficial deviation under consideration. Another concept introduced in [17] is that of 'fulfilled coalition', which informally characterises coalitions that have the strategic power (a joint strategy) to ensure a minimum given payoff no matter what the other players in the game do. Generalising to our setting, from qualitative to quantitative payoffs, we introduce the notion of a lower bound: let C ⊆ Ag be a coalition in a game G and let z C ∈ Q C . We say that z C = (z 1 , . . . , z i , . . . , z |C| ) is a lower bound for C if there is a joint strategy σ C for C such that for all strategies σ −C for Ag \ C, we have pay i (π( σ C , σ −C )) ≥ z i , for every i ∈ C.
Based on the definition above, we can prove the following lemma, which characterises the core in terms of paths where (mean-)payoffs can be ensured collectively, no matter any adversarial behaviour.

Lemma 5.
Let π be a path in G. There is σ ∈ CORE(G) such that π = π( σ) if and only if for every coalition C ⊆ Ag and lower bound z C ∈ Q C for C, there is some i ∈ C such that z i ≤ pay i (π).
Proof. To show the left-to-right direction, suppose that there exists a member of the core σ ∈ CORE(G) with π = π( σ) and suppose further that there is some coalition C ⊆ Ag and lower bound z C ∈ Q C for C, such that for every i ∈ C we have z i > pay i (π). Because z C is a lower bound for C, and z i > pay i (π), for every i ∈ C, then there is a joint strategy σ C for C such that for all strategies σ −C for Ag \ C, we have pay i (π( σ C , σ −C )) ≥ z i > pay i (π), for every i ∈ C. Then, it follows that (G, σ) ∈ BEN-DEV, which further implies that σ cannot be in the core of G-a contradiction to our initial hypothesis.
For the right-to-left direction, suppose that there is π in G such that for every coalition C ⊆ Ag and lower bound z C ∈ Q C for C, there is i ∈ C such that z i ≤ pay i (π). We then simply let σ be any strategy profile such that π = π( σ). Now, let C = {j, . . . , k} ⊆ Ag be any coalition and σ C be any possible deviation of C from σ. Either z C = (pay j (π( σ −C , σ C )), . . . , pay k (π( σ −C , σ C ))) is a lower bound for C or it is not.
If we have the former, by hypothesis, we know that there is i ∈ C such that pay i (π( σ −C , σ C )) ≤ pay i (π). Therefore, i will not have an incentive to deviate along with C \ {i} from σ, and as a consequence coalition C will not be able to beneficially deviate from σ.
If, on the other hand, z C is not a lower bound for C, then, by the definition of lower bounds, we know that it is not the case that σ C is a joint strategy for C such that for all strategies σ −C for Ag \ C, we have pay i (π( σ C , σ −C )) ≥ pay i (π( σ −C , σ C )), for every i ∈ C. That is, there exists i ∈ C and σ −C for Ag \ C such that pay i (π( σ C , σ −C )) < pay i (π( σ −C , σ C )). We will now choose σ −C so that, in addition, pay i (π) ≥ pay i (π( σ C , σ −C )) for some i.
Let z C = (pay j (π( σ j −C , σ C )), . . . , pay k (π( σ k −C , σ C ))) where pay i (π( σ i −C , σ C )) is defined to be min σ −C ∈Σ −C pay i (π(( σ −C , σ C ))). That is, σ i −C is a strategy for Ag \ C which ensures the lowest mean-payoff for i assuming that C is playing the joint strategy σ C . By construction z C is a lower bound for Csince each z i = pay i (π( σ i −C , σ C )) is the greatest mean-payoff value that i can ensure for itself when C is playing σ C , no matter what coalition Ag \ C does-and therefore, by hypothesis we know that for some i ∈ C we have pay i (π( σ i −C , σ C )) ≤ pay i (π). As a consequence, as before, i will not have an incentive to deviate along with C \ {i} from σ, and therefore coalition C will not be able to beneficially deviate from σ. Because C and σ C where arbitrarily chosen, we conclude that σ ∈ CORE(G), proving the right-to-left direction and finishing the proof.
With this lemma in mind, we want to determine if a given vector, z C , is in fact a lower bound and importantly, how efficiently we can do this. That is, to understand the following decision problem: LOWER-BOUND: Given: Game G, coalition C ⊆ Ag, and vector z C ∈ Q Ag . Question: Is z C is a lower bound for C in G?
Using the MULTI-MEAN-PAYOFF-THRESHOLD decision problem introduced earlier, we can prove the following proposition: Theorem 6. LOWER-BOUND is co-NP-complete.
Proof. First, we show that LOWER-BOUND lies in co-NP by reducing it to MULTI-MEAN-PAYOFF-THRESHOLD. Suppose we have an instance, (G, C, z C ), and we want to determine if it is in LOWER-BOUND. We can do this by forming the following two-player, multi-mean-payoff game, G = (V 1 , V 2 , v 0 , E, w , z k ), where: , tr(s, (ac C , ac Ag\C ))) | ac Ag\C ∈ Ac Ag\C }; • w : E → Z |C| , with w i (s, (s, ac C )) = w i (s) and, w i ((s, ac C ), tr(s, (ac C , ac Ag\C ))) = w i (s); Informally, the two players of the game are C and Ag \ C, the vector weight function is given by aggregating the weight functions of C and the threshold is z C . Now, if in this game, player 1 has a winning strategy, then there exists some strategy σ C such that for all strategies of player 2, σ Ag\C , we have that π( σ C , σ Ag\C ) is a winning path for player 1. However, this means that pay i (π( σ C , σ Ag\C )) ≥ z i for all i ∈ C. However, it is easy to verify that this implies that z C is a lower bound for C in G. Conversely, if player 1 has no winning strategy, then for all strategies, σ C , there exists some strategy σ Ag\C such that π( σ C , σ Ag\C ) is not a winning path. This is turn implies that for some j ∈ C, we have that pay j (π( σ C , σ Ag\C )) < z j , which means that z C is not a lower bound for C in G. Additionally, note that this construction can be performed in polynomial time, giving us the co-NP upper bound. For the lower bound, we go the other way and reduce from MULTI-MEAN-PAYOFF-THRESHOLD.
For the lower bound, we reduce from MULTI-MEAN-PAYOFF-THRESHOLD. Suppose we would like to determine if an instance G is in MULTI-MEAN-PAYOFF-THRESHOLD. Then we form a concurrent mean-payoff game, G , with k + 1 players, where the states of G coincide exactly with the states of G. In this game, only the 1st and (k + 1)th player have any influence on the strategic nature of the game. If the game is in a state in V 1 , player one can decide which state to move into next. Otherwise, if the game is in a state within V 2 , then the (k + 1)th player makes a move. Note we only allow moves that agree with moves allowed within G. Now, in G , the first k players have weight functions corresponding to the k weight functions of player 1 in G. The last player can have any arbitrary weight function. With this machinery in place, we ask if z k is a lower bound for {1, . . . , k}. In a similar manner of reasoning to the above, it is easy to verify that G is an instance of MULTI-MEAN-PAYOFF-THRESHOLD if and only if z k is a lower bound for {1, . . . , k} in the constructed concurrent mean-payoff game. Moreover, this reduction can be done in polynomial time and we can conclude that LOWER-BOUND is co-NP-complete.
We have not presented any bounds for the complexity of E-CORE in the general case. One possible reason for the upper bounds remaining elusive to us is due to the fact that whilst in a multi-mean-payoff game, player 2 can act optimally with memoryless strategies, player 1 may require infinite memory [32,33]. Given the close connection between the core in our concurrent, multi-agent setting and winning strategies in multi-mean-payoff games, this raises computational concerns for the E-CORE problem. Additionally, in [35], the authors study the Pareto frontier of multi-mean-payoff games, and provide a way of constructing a representation of it, but this procedure has an exponential time dependency. The same paper also establishes Σ P 2 -completeness for the polyhedron value problem. Both of these problems appear to be intimately related to the core, and we hope we might be able to use these results to gain more insight into the E-CORE in the future.
With this having been said, we conclude this section by establishing a link between traditional non-transferable utility (NTU) games and our mean-payoff games-as NTU games are very well studied, and there is a wealth of results relating to core non-emptiness in this setting [36][37][38], we hope that some of these results could be utilised in order to understand the core of mean-payoff games.
Formally, an n-person game with NTU is a function, V : P (Ag) → R |Ag| , such that, 1.
For all C ⊆ Ag, V(C) is a non-empty, proper, closed subset of R |Ag| ; 2.
For all C ⊆ Ag, if we have x ∈ V(C) and y ∈ R |Ag| such that y i ≤ x i for all i ∈ C, then we have y ∈ V(C); 3.
We have that V(Ag) \ i∈Ag int V({i}) is non-empty and bounded. We begin by giving a translation from mean-payoff games to NTU games. Let G be a mean-payoff game; then we define an NTU game, G NTU = V : P (Ag) → R |Ag| as follows. If C ⊆ Ag, then, In words, V(C) consists of the set of lower bounds that C can force. Note that for an outcome x ∈ V(C), the components x i for i ∈ Ag \ C do not matter-they can be arbitrary real numbers. Lemma 6. Let G be a game, and let G NTU be the NTU game associated with G. Then G NTU is well-defined.
Proof. We need to show that the three conditions in the definition of an NTU game hold for G NTU .
For condition (1), we see that V(C) is always non-empty by noting a coalition can always force an outcome where they achieve at least their worst possible payoff each (the vector made up of each player's lowest weight in the game). The fact that V(C) is closed follows from Theorem 4 of [35]. We also see that V(C) is a proper subset of R |Ag| , as the members of C can do no better than achieve their maximum weights.
For condition (2), suppose we have x ∈ V(C), and y ∈ R |Ag| with y i ≤ x i for all i ∈ C. If x ∈ V(C), then there exists some σ C , such that for all σ Ag\C , we have pay i (π( σ C , σ Ag\C )) ≥ x i for all i ∈ C. However, this in turn implies that pay i (π( σ C , σ Ag\C )) ≥ y i for all i ∈ C. Thus, by definition, we have y ∈ V(C).
For condition (3), let p j be the punishment value of the player j in the game G. Informally, the punishment value of a player j can be thought of as the worst payoff that the other players can inflict on that player. Alternatively, we can view the punishment value for player j as the best payoff they can guarantee themselves, no matter what the remaining players do-in this way, we can see that the punishment value is a maximal lower bound for a player.
Consider the vector p ∈ R |Ag| , where the jth component of this vector is the punishment value for player j. Naturally, this vector lies in V(Ag). Additionally, we claim that is does not lie in int V({i}) for any i ∈ Ag. For a contradiction, suppose there existed some j ∈ Ag with p ∈ int V({j}). So there exists some > 0, such that for all 0 ≤ r < , there exists some strategy, σ r j , such that for all counterstrategies, σ −j , we have pay j (π(σ r j , σ −j )) ≥ p j + r. However, this implies player j can achieve a better payoff than their punishment value-a contradiction. Thus, we see that the set V(Ag) \ i∈Ag int V({i}) is non-empty.
Finally, to see that V(Ag) \ i∈Ag int V({i}) is bounded, we claim that it is contained in a closed ball of radius M, where M is defined to be: We show that if x ∈ V(Ag), then we either have x ∈ i∈Ag int V({i}) or x ∈ B(0, M), i.e., the closed ball of radius M, centred at the origin. If x ∈ V(Ag), then by definition, we must have x i ≤ M for all i ∈ Ag. Now, there are two possibilities: if we have x i ≥ −M for all i ∈ Ag, then we have x ∈ B(0, M). So instead suppose there exists some i ∈ Ag such that x i < −M. In this case, letting be any positive number such that x i + ≤ −M, any strategy σ i has the property that for all counter-strategies σ −i , we have pay i (π( σ −i , σ i )) ≥ x i + . Thus, we have x ∈ int V({i}). This implies that, which in turn implies, yielding the result.
Given that we can translate mean-payoff games into well-defined NTU games, it is natural to ask whether we can use traditional cooperative game theory in order to understand the core in our setting. Thus, we introduce the (classic) definition of the core for NTU games. In an NTU game, we say that an element x ∈ R |Ag| is in the core if x ∈ V(Ag), and there exists no C ⊆ Ag and no y ∈ V(C) such that x i < y i for all i ∈ C. In the following result, we show that the core of a mean-payoff game, and the core of its corresponding NTU game are intimately related: Lemma 7. Let G be a mean-payoff game. Let G NTU be the NTU game associated with G. Then the core of G is non-empty if and only if the core of G NTU is non-empty.
Proof. First suppose that G has a non-empty core. Thus, there exists some strategy profile σ such that for all coalitions C and for all strategy vectors σ C , there exists some σ Ag\C such that pay i (π( σ C , σ Ag\C )) ≤ pay i (π( σ)) for some i ∈ C. Let x ∈ R Ag be such that x i = pay i ( σ) for all i ∈ Ag. Then by definition, we have x ∈ V(Ag). We claim that x is in the core of G NTU . Suppose there is some C ⊆ Ag and a y ∈ V(C) such that x i < y i for all i ∈ C. Thus, there exists some σ C such that for all σ Ag\C , such that pay i (π( σ C , σ Ag\C )) ≥ y i > x i = pay i ( σ) for all i ∈ C. However, this implies that σ is not in the core of G, which is a contradiction. Thus, x is in the core of G NTU .
Conversely, suppose that G NTU has an empty core. Thus, there exists some x ∈ R |Ag| such that x ∈ V(Ag), such that there exists no C ⊂ Ag and no y ∈ V(C) with x i < y i for all i ∈ C. Since x ∈ V(Ag), there exists some strategy σ such that pay i (π( σ)) ≥ x i for all i ∈ Ag. We claim that σ is in the core of G. If it were not, then there would exist some coalition C, and some strategy vector σ C such that for all strategy vectors σ Ag\C , we have π( σ C , σ Ag\C ) i π( σ) for all i ∈ C. We then define y ∈ R Ag by setting y i = min σ Ag\C pay i (π( σ C , σ Ag\C )) for i ∈ C and setting y i = 0 for i ∈ Ag \ C. Then we have that y ∈ V(C) by definition. Since for all σ Ag\C , we have that π( σ C , σ Ag\C ) is strictly preferred to π( σ) by all players in C, we must have that y i > x i for all i ∈ C. However, this contradicts the fact that x is in the core of G NTU . Thus, we must have that σ is in the core of G.
As stated previously, we have been unable to determine the complexity of E-CORE in the setting of mean-payoff games. However, given the above result, we suggest a route which may bear fruits in the future. In [36][37][38][39][40], the authors reason about the core of cooperative games (in both the transferable utility and non-transferable utility settings) by appealing to the notion of a balanced set. In [37], the authors generalise this by introducing the notion of π-balancedness. Let π = {π C ∈ R |Ag| | C ⊆ Ag} be a collection of vectors such that:
For all C ⊆ Ag, and for all i ∈ C, we have π C,i = 0; 3. For all C ⊆ Ag, and for all i ∈ C, we have π C,i ≥ 0, and let C ⊆ P (Ag) be a collection of coalitions. We say that C is π-balanced if there exist balancing weights, λ C > 0, for each C ⊆ Ag such that: We then say that an NTU game, V, is π-balanced if whenever C is a π-balanced collection, we have: In [37], the authors show that if there exists some π such that V is π-balanced, then V has a non-empty core. The condition of π-balancedness translates readily over to the setting of mean-payoff games, and so we see that if such a game is π-balanced, then it has a non-empty core. This suggest a (sound, but not complete) algorithm for detecting if a meanpayoff game has a non-empty core; somehow guess a polynomial-sized π, use a linear program to calculate the corresponding balancing weights, and then use an co-NP oracle to verify there exists no π-balanced collection such that C∈C V(C) ⊆ V(Ag). Obviously, this is not a rigorous argument, but is suggestive of what a possible solution may look like.
Additionally, whilst π-balancedness is a sufficient condition for core non-emptiness, it is not necessary. However, in [37], the authors strengthen the condition of π-balancedness in the setting of convex-valued NTU games, to obtain a necessary and sufficient result. Given in that mean-payoff games, the outcomes that a coalition can achieve can be expressed as a union of convex sets, this approach seems promising. However, we have been unable to yield any results via this route.

Weighted Reactive Module Games
One problem with concurrent game structures as we have worked with them so far is that they are extremely verbose. The transition function, tr : St × Ac 1 × · · · × Ac |Ag| → St is a total function, so it has size |Ac| |Ag| . Thus, the size of the game scales exponentially with the number of the agents. In Example 1, the underlying concurrent game structure has a size of 429,981,696. If we are ever to have computational tools to support the decision problems described in this paper, then such "extensive" representations are not viable: we will require compact frameworks to represent games.
One natural framework we can use to induce succinctness is that of Reactive Modules [41]. Specifically, we modify the Reactive Module Games of [18] with weights on the guarded commands. We begin by walking through some preliminaries.
Reactive modules games do not use the full power of reactive modules, but instead use a subset of the reactive modules syntax, namely the simple reactive modules language (SRML) [42]. In SRML terms, agents are described by modules, which in turn consist of a set of variables controlled by the module, along with a set of guarded commands. Formally, given a set of propositional variables Φ, a guarded command g is an expression of the form: where ϕ and each ψ i are propositional formulae over Φ and each x j also lies in Φ. We call ϕ the guard of g and denote it by guard(g), and we call the variables (the x j s) on the right-hand-side of g the controlled variables of g, denoted by ctr(g). The idea is that under a given valuation of a set of variables, v ⊆ Φ, each module has a set of commands for which guard(g) is true (we say that they are enabled for execution). Each module can then choose one enabled command, g, and reassign the variables in ctr(g) according to the assignments given on the right hand side of g. For instance, if ϕ were true, then the above guarded command could be executed, setting each x j to the truth value of ψ j under v. Only if no g is enabled, a special guarded command g skip -which does not change the value of any controlled variable-is enabled for execution so that modules always have an action they can take.
To define the size of a guarded command, we first define the size of a propositional formula, ϕ, denoted |ϕ|, to be the number of logical connectives it contains. Then the size of a guarded command g, written |g|, is given by |guard(g)| + |ctr(g)|.
Given a set of propositional variables, Φ, a simple reactive module, m, is a tuple (Ψ, I, U), where: I is a set of initialisation guarded commands, where for all g ∈ I, we have guard(g) = and ctr(g) ⊆ Ψ. • U is a set of update guarded commands, where for all g ∈ U, guard(g) is a propositional formula over Φ and ctr(g) ⊆ Ψ.
After defining a simple reactive module m, we introduce an additional command, g skip , of the form, g∈U ¬guard(g) ∅ with ctr(g) = Ψ. The empty set on the right hand side of the guarded command means that no variables are changed. We introduce this extra guarded command so that at each stage, every module has some action they can take. However, this is not obligatory, and if we define a reactive module where we can prove at each step, there will be an available action, then introducing g skip is not necessary. With this added, the size of a reactive module, |m|, is given by the sum of the sizes of its constituent guarded commands. Given this, an SRML arena, A, is a tuple A = (Ag, Φ, {m i } i∈Ag ), where Ag is a finite, non-empty set of agents, Φ is a set of propositional variables and each m i is a simple reactive module m i = (Φ i , I i , U i ) such that {Φ i } i∈Ag is a partition for Φ. We define the size of an arena to be sum of the sizes of the modules within it.
With this syntactic machinery in place, we are finally ready to describe the semantics of SRML arenas. We give a brief, high-level description here-for details, please refer to [18,42].
With this syntactic machinery in place, we are finally ready to describe the semantics of SRML arenas. Let v ⊆ Φ be a valuation and let i ∈ Ag be an agent. We let enabled i (v) denote the guarded commands available to agent i under the valuation v. Formally, we have: We then define enabled(v) = enabled 1 (v) × · · · × enabled |Ag| (v). Given that each U i contains a command of the form g skip i as described previously, we can see that each enabled i (v) is always non-empty.
We also define exec i (g, v), which given a valuation v, is the valuation of Φ i given by executing g = ϕ x 1 := ψ 1 ; . . . ; x k := ψ k . Specifically, we have: Thus, upon executing g, agent i's variables are reassigned according to the valuations of the propositional formulae on its right hand side. However, with multiple agents, there is some strategic interaction, and we wish to understand how the actions of all the agents affect the state of the system. As such, we define joint guarded commands, which are simply a selection of guarded commands, one for each agent. That is, we write J = g 1 × · · · × g |Ag| . Similarly to before, we set exec(J, v) = exec 1 (g 1 , v) ∪ . . . ∪ exec |Ag| (g |Ag| , v).
We are now ready to describe how arenas 'play out'. Initially, each agent i picks a guarded command g 0 i ∈ I i and this sets each of their variables accordingly, inducing a valuation v 0 = exec(J 0 , Φ). The agents then each pick a guarded command g 1 i ∈ enabled i (v 0 ) and execute them, inducing another valuation, v 1 = exec(J 1 , v 0 ). They repeat this ad infinitum. As such, this induces a path of the game. However, unlike before, where we defined paths over states of the game, here, we define paths over joint guarded commands, π : N → (I 1 ∪ U 1 ) × · · · × (I |Ag| ∪ U |Ag| ). Whilst, superficially, this may look like a departure from our previous convention, it is not. Given that these games are entirely deterministic, if we know the sequence of joint guarded commands that have been taken, we know infer sequence of states. Additionally, knowing the sequence of joint guarded commands provides us with more information that knowing the sequence of states-a state may have multiple joint guarded commands the lead to it. All of the techniques we developed before transfer readily to this new setting, so take it for granted that there is a straightforward link between the two approaches and will not comment on it further.
We can now define weighted reactive module games. A weighted reactive module game (WRMG), G = (A, {w i } i∈Ag ), is an SRML arena, A = (Ag, Φ, {m i } i∈Ag ), along with a set of weight functions, with w i : I i ∪ U i → Z. That is, each module has an assigned weight function that maps commands to integers. As before, a player's payoff is given by the mean-payoff of the weights attached to a path.
Note that we are effectively assigning weights to transitions, rather than to states as we did before. This is not a huge conceptual shift, and moving between the two representations (weights on states and weights on transitions) is a relatively straightforward transformation.
Finally, we need to define ω-regular specifications in the context of WRMGs. Sets of states are already conveniently parameterised by the propositional variables of Φ, so we introduce specifications which are Boolean combinations of atoms of the form Inf(p) with p ∈ Φ. Moreover, since if p is a proposition, then ¬p is also a proposition, we use the shorthand Inf(¬p), with the obvious interpretation, in place of the previous Inf(F) notation. The semantics of these specifications are defined in a nearly identical way to before.
Before considering the decision problems relating to WRMGs, let us now walk through an example to demonstrate their conciseness and utility of WRMGs. We do this by revisiting Example 1.

Example 2.
In Example 1, the state of the game is entirely described by the position of each of the four robots, whether they are holding a parcel or not, and whether they have crashed. Thus, we define four reactive modules m 1 , . . . , m 4 with m i = (Φ i , I i , U i ) as follows-we set Φ i = {x i,1 , . . . , x i,12 , p i , d i , c i }, where the x i model which node the robot is in, numbered top-to-bottom, left-to-right with respect to the diagram, p i denotes if the robot is carrying a parcel or not, and c i denotes if the robot has crashed or not. With this defined, we can define one initialisation command for each robot: x i,1 := ⊥; . . . x i,n := ; . . . ; x i,12 := ⊥; p i := ⊥; c i := ⊥ [0], where n is appropriately set, given the starting position of the robot. Additionally, the [0] at the end of the guarded command denotes the weight rewarded for performing that command. Then for each agent i and edge (x n , x m ) of the graph, we define a guarded command, We also model picking up and delivering a parcel, as well as crashing into another robot. We do this with the following commands: where i ranges over players; j, k and l range over the other players; and j ranges from 1 to 12. We also have the g skip command from before, so the robot can stay still on a node for a time step.
We can also rewrite the specification of Equation (1) in our new setting as follows: i∈ [4] Inf(p i ) ∧ Inf(¬p i ).
It is easy to see that this setup models the example from before, is exponentially more concise, requiring 52 guarded commands in total, and is natural to work with. Note that we could save even more space by encoding the robots positions in binary, at the expense of making our guarded commands slightly more complicated. Whilst this technique may be useful for larger systems, we give a unary encoding here for clarity. Moreover, our specification has not increased in size.
With WRMGs now adequately motivated, the main decision problem to consider then is the following: WRMG-E-NASH: Given: WRMG G, and ω-regular specification α. Question: Does there exist a σ ∈ NE(G) such that π( σ) |= α? Theorem 7. The WRMG-E-NASH problem lies in NEXPTIME and is EXPTIME-hard.
Proof. The idea is to 'blow up' the simple reactive module arena into a concurrent game structure, then apply the same techniques as before. Explicitly, given a WRMG, G = (Ag, Φ, {m i } i∈Ag , {w i } i∈Ag ), we form a graph G = (V, E) as follows. We set V = P (Φ ∪ {p init }) and then introduce an edge, e = (v, w) if there exists some joint guarded update command J such that exec(J, v) = w. We also introduce edges ({p init }, v), if there exists some joint guarded initialisation command J such that exec(J, ∅) = v. We then form a concurrent game structure, G = (Ag , St, s 0 , (Ac i ) i∈Ag , tr) in the natural way. We set Ag = Ag and St = V. We also set s 0 = {p init }. The actions for each player, Ac i , consist of their respective guarded commands and the transition function corresponds to the edges of the graph. The weights of the guarded commands are attached to the weights of the transitions in the concurrent game structure (Whilst our analysis of E-NASH is couched in terms of weights on the states, in the proof of it, we modify the concurrent game structure so the weights exist on the transitions instead. Thus, this reduction does not need to introduce extra states to convert the weights on the transitions into weights on the states-we simply 'cut the corner' instead).
We also need to transform the ω-regular specification on the weighted reactive module games, α, to a specification on the concurrent game structure, α . To do this, for each propositional variable p, define a subset F p ⊆ St where v ∈ F p if p ∈ v. Then we simply replace every occurrence of Inf(p) in α with Inf(F p ).
It should be apparent that (G , α ) exhibits the same behaviours and qualities as (G, α) and is exponential in size relative to (G, α). With this expansion complete, we can straightforwardly apply the NP algorithm of Theorem 2, immediately giving us the NEXPTIME upper bound.
To show hardness, we need to introduce another decision problem, PEEK-G4 [43]. A PEEK-G4 instance is a tuple (X 1 , X 2 , X 3 , ϕ). Here, X 1 and X 2 are finite, disjoint, non-empty sets of propositional variables, X 3 ⊆ X 1 ∪ X 2 gives the variables that are true in the initial state, and ϕ is a propositional formula over X 1 ∪ X 2 . The game is played by starting in the configuration given by X 3 . The players then alternate, either choosing to toggle a variable they control, or skipping a go. If on their turn, a player can make ϕ true, then they win and the game is over. The decision problem associated with this game is to determine if agent 2 has a winning strategy for a give tuple (X 1 , X 2 , X 3 , ϕ).
The turn p variable keeps track of which player's turn it is, and the turn u variable determined if it is the umpire's turn. Finally, the two win variables keep track of whether anyone has won yet. The update commands for m 0 are defined as follows: turn u ∧ ¬ϕ turn u := ⊥; turn p := ¬turn p , turn u ∧ ϕ ∧ turn p win 1 := , turn u ∧ ϕ ∧ ¬turn p win 2 := , There is a lot going on in the above set of commands, so let us break it down: the first two commands say that play alternates between the players and the umpire-one of the players makes a move, then play goes to the umpire, who checks for wins, then play passes to the other player, and so on. The third and fourth commands say that if a player managed to satisfy ϕ on their last turn, then they have won, and the umpire can declare them the winner. The final command says that the umpire will just stay put and keep their variables constant once one player has won.
The two player modules are symmetric, so we define m 1 , with m 2 being formed in a similar way. For m 1 , we have Φ 1 = X 1 ∪ {win 2 }, with a single initialisation command: x i 1 := ; . . . ; x i n := , where {x i k } 1≤k≤n = X 1 ∩ X 3 . This simply says that player 1 must set their variables as dictated by X 3 Then for every variable x i ∈ X 1 , we introduce an update guarded command of the following form, This says that if neither player has won yet, and it is player 1's turn to play, then they can toggle one of the variables they control. Additionally, we need to allow a player to be able not to change any of their variables, and we do this with the following guarded command, ¬(win 1 ∨ win 2 ) ∅.
Note that player 1 can always play this command, even when it is not their turn. Finally, we need to introduce two more guarded commands that the players take when the game is over: If someone has won the game, then both players ignore the turn system and have the lone option of following exactly one of the above two commands forever.
With this defined, we need to introduce weight functions for each of the three players. For the umpire, we give a constant weight function (so they are ambivalent between all outcomes) and for player 1 we give them a weight function w 1 such that, w 1 (win 1 ∅) = 1, w 1 (win 2 ∅) = −1, w 1 (g) = 0, for all other g.
with player 2's weight function being defined in the dual way. Finally, the ω-regular specification is simply α = Inf(win 2 ). We claim that player 2 has a winning strategy in PEEK-G4 if and only if there exists a Nash equilibrium in the weighted reactive module game that models the given specification. Moreover, it is patently clear that this construction can be performed in polynomial time.
First suppose that player 2 has a winning strategy in PEEK-G4. Then m 2 can play this strategy and m 1 can play any arbitrary strategy and eventually the game will end up in state win 2 . From here, each player only has one guarded command they can play, meaning player 2 will get a payoff of 1 and player 1 will get a payoff of −1. This is a Nash equilibrium. No matter what m 1 plays, they are unable to force a win against player 2's winning strategy and so have no incentive to deviate. Additionally, player 2 is achieving their maximum payoff, and thus have no motivation to deviate. This strategy profile also models α.
Conversely, suppose there exists some Nash equilibrium which models α. This implies that the game eventually ends up in win 2 , and so player 2 has a payoff of 1 and player 1 a payoff of −1 in this equilibrium. The fact that this is an equilibrium implies that no matter what player 1 plays, they cannot increase their payoff, i.e. they always lose against the strategy player 2 is playing. This implies that player 2 has a winning strategy in PEEK-G4 and this concludes the proof.

Contributions of This Paper
This paper introduces ω-regular specifications as a natural, expressive, computationally tractable way of reasoning qualitatively about concurrent games. In particular, we establish several results within the rational verification framework, the most important of which are the complexity bounds for the E-NASH, E-CORE and WRMG-E-NASH problems.

Mean-Payoff Games
Mean-payoff games are a useful tool in the analysis of computer systems. Most work has been devoted to the study of two-player zero-sum games, which can be solved in NP ∩ co-NP [6].
Beyond such games, two kinds of mean-payoff games have been studied: multiplayer mean-payoff games, whose solution was studied with respect to Nash equilibria [5], and two-player multi-mean-payoff games [32], where the focus is on the computation of winning strategies for either player, a problem that can be solved in co-NP in the general case, and in NP for memoryless strategies.

Combined Qualitative and Quantitative Reasoning
Combining qualitative and quantitative reasoning has mainly been done by modifying players' mean-payoff with some qualitative measure. In [44], the authors consider twoplayer, zero-sum games, where on each path of the game, every player is assigned a two-size tuple (parity goal, mean-payoff), where each player's payoff is −∞ if the parity goal is not met, and the mean-payoff otherwise. In a similar setting, [30,31] look at multi-player concurrent games with lexicographic preferences over (parity/LTL goal, mean-payoff) tuples and look at the decision problem of determining if there exists some finite state strict Nash equilibrium. Additionally, [15] considered multi-player concurrent games where the players have mean-payoff goals, and the question is whether there is some Nash equilibrium which models some temporal specification.

ω-Regular Specifications
Games with ω-regular objectives have been studied mostly in the context of two-player games [8], where the goal of one of the players is to show that the ω-regular objective holds in the system, while the goal of the other player is to show otherwise. Such games are usually used in the context of synthesis and model-checking of temporal logic specifications. These two-player zero-sum games are rather different from ours since in our games, the ω-regular specification is not part of the goal of the players, but rather a property that an external system designer wishes to see satisfied. This changes completely the overall problem setup and explains why the drastic differences in complexity between traditional rgames with ω-regular objectives-whose complexity can range from P (for instance, for Büchi games) to PSPACE (for instance, for Muller games)-and multi-player mean-payoff games with ω-regular specifications, even for two-player zero-sum instances with constant weights.

On Rational Verification
The problem studied in this paper is called Rational Verification, which has been studied for different types of games and specification languages [10]. While rational verification with LTL goals is 2EXPTIME-complete, the problem can become considerably easier when considering simpler specification languages [15]. However, in the context of multi-player mean-payoff games, only a solution for generalised Büchi specification was known, using an encoding via GR(1) specifications, and only for Nash equilibrium. In this paper, we have extended such results to account for all ω-regular specifications, and have provided results for cooperative games and succinct representations. With respect to the former, the only relevant related work is [17], where the core for concurrent game structures was introduced. Furthermore, regarding the latter, a comprehensive study using reactive modules games can be found in [18].

Future Work
There are multiple interesting avenues for future research. The two most obvious open problems are that of determining the complexity bounds for the E-CORE problem in the general case, as well as closing the complexity gap for the WRMG-E-NASH problem. Following this, there are other directions that appear to be fruitful, as well as interesting and worthwhile-for example, introducing both imperfect information and nondeterminism offers a closer approximation to real-world systems, as well as raising interesting mathematical questions. We are also interested in using ω-regular specifications to understand ω-regular games in a unified, principled way.

Conflicts of Interest:
The authors declare no conflict of interest.

mp(β)
The mean-payoff of an infinite sequence of real numbers, β; formally defined in Section 2, but can be thought of informally as a sort of infinite average NE(G) The set of Nash equilibria of a game G

|=
The modelling relation-informally, π |= α means that the path π adheres to the behaviour described in α; defined formally in Section 2 w i The weight function of player i, which defines that player's preferences M A concurrent game structure Ag The set of agents in a given concurrent game structure St The set of game states in a given concurrent game structure s 0 The start state of a concurrent game structure Ac i The actions available to player i in a given concurrent game structure tr The transition function of a given concurrent game structure-takes the current state of the game, along with an action for each agent/player, and outputs a new state for the game A An SRML arena m i A reactive module corresponding to player i U i A set of 'initialisation commands' in the reactive module corresponding to player i U i A set of 'update commands' in the reactive module corresponding to player i guard(g) The 'guard' of the guarded command g-that is, the precondition for taking the action defined by g ctr(g) The 'controlled' of the guarded command g-that is, the propositional variables that will be affected upon executing g enabled i (v) The guarded commands available to player i under the valuation v The valuation of player i's variables obtained by executing the guarded command g under the valuation v