On the Nash Equilibria of a Duel with Terminal Payoffs

: We formulate and study a two-player duel game as a terminal payoffs stochastic game. Players P 1 , P 2 are standing in place and, in every turn, each may shoot at the other (in other words, abstention is allowed). If P n shoots P m ( m (cid:54) = n ), either they hit and kill them (with probability p n ) or they miss and P m is unaffected (with probability 1 − p n ). The process continues until at least one player dies; if no player ever dies, the game lasts an inﬁnite number of turns. Each player receives a positive payoff upon killing their opponent and a negative payoff upon being killed. We show that the unique stationary equilibrium is for both players to always shoot at each other. In addition, we show that the game also possesses “ cooperative ” (i.e., non-shooting) non-stationary equilibria . We also discuss a certain similarity that the duel has to the iterated Prisoner’s Dilemma .


Introduction
In this paper, we study a two-player duel game played in turns.Players P 1 , P 2 are standing in place and, in each turn, each player may shoots at the other; in other words, abstention is allowed.If P n shoots at P m (m = n), either they hit and kill them or they miss and P m is unaffected; the respective probabilities are p n (P n 's marksmanship) and 1 − p n .The process continues until at least one player dies; if no player ever dies then the game lasts an infinite number of turns.We formulate the above as a stochastic game with terminal payoffs.The precise game rules and players' payoffs will be presented in Section 2.
Little work has been done on the duel.In fact, to the best of our knowledge, it has only been studied as a preliminary step in the study of the "truel", in which three stationary players shoot at each other.In early works on the truel [1][2][3][4], the postulated game rules guarantee the existence of exactly one survivor ("winner").In an important early paper [5], the somewhat paradoxical result of "survival of the weakest" is established; namely for certain marksmanship combinations, the player with lowest marksmanship has the highest probability of survival.A more general analysis appears in a further study [6], which considers the possibility of "cooperation" between the players, in the sense that each player has the option of abstaining, i.e., not shooting at their opponent in one or more turns of the game.This idea is further studied by Kilgour (for the simultaneous truel) [7] and (the sequential truel) [8,9].These papers are, to the best of our knowledge, the first to address the truel problem using a rigorous game theoretic analysis.Kilgour formulates both the simultaneous and sequential truel as stochastic games with terminal payoffs (i.e., the players receive a single payoff at the end of the game) and obtains Nash equilibria, under appropriate conditions.A similar analysis appears in a further study [10], where, however, the truel is formulated as a discounted stochastic game.Recent papers on the truel include: Refs.[11][12][13][14] where, among other innovations, the truel is formulated as an extensive form game; Refs.[15][16][17][18], where a Markov chain formulation of several truel variants is presented; and Refs.[19][20][21], in which truels among N players are studied, with each player being represented by a node in a scale-free network. 1  Several applications of the duel and, more frequently, of the truel have been proposed in the above literature.The truel has been used to model behavior in confrontation situa-tions [25] and in political conflicts [26].A truel variant has been used as a model of opinion dissemination [17].Business applications have been presented in a further study [27], in which it is shown that, under certain conditions, weaker companies can grow stronger and stronger companies can grow weaker with all the parties eventually converging.In legal studies, the truel has been used to explore equality issues [28].Last but not least, the nuel (an N-person generalization of the duel and truel) has been used in biology to explain the maintenance of variation in natural populations [29] and study marriage and reproduction mechanisms [30].Furthermore, the truel is relevant to the existence of "suicidal strategies" employed by cells and bacteria [31,32].
A common characteristic of all the above-mentioned works is that they limit themselves to the study of stationary strategies.As we will show in the current paper, the duel also possesses Nash equilibria in non-stationary strategies and it is safe to assume that the same is true of the truel and the nuel (the N-player generalization of the duel and truel).
While the above papers focus on various forms of the truel, we believe that the duel is interesting in its own right and has not received the attention it deserves.In particular we will show that, under our formulation, the duel has a certain similarity to the iterated Prisoner's Dilemma (IPD) and possesses "cooperative" Nash equilibria in non-stationary strategies.
In this paper, we study two versions of the duel with terminal payoffs.The rest of the paper is structured as follows.In Section 2, we define the game rigorously.In Section 3 we establish the existence of equilibria in stationary strategies.In Section 4, we discuss some similarities between the game and the IPD.In Section 5, we prove that the duel also has equilibria in non-stationary strategies (namely grim cooperation and Tit-for-Tat).In Section 6, we summarize our results and propose some future research directions.

1.
The game stays in state 11 ad infinitum (no player is ever killed); 2.
At some t the game moves to a state s(t ) ∈ {10, 01, 00} (one or both players are killed).These are terminal states, i.e., as soon as they are reached, the game terminates.
When the game reaches a terminal state s, P n (n ∈ {1, 2}) receives payoff q n (s) as follows: where we assume that for n ∈ {1, 2}, a n > 0 and b n > 0. We set a =(a 1 , a 2 )

)s(1)...f(T)s(T), a non-terminal finite history is an h = s(0)f(1)s(1)...f(T)s(T)
where s(T) = 11 and an infinite history is an h = s(0)f(1)s(1)... .An admissible history is one which conforms to the game rules; the set of all admissible finite (resp.infinite) histories is denoted by H * (resp.H ∞ ); H * denotes the set of all non-terminal finite histories.The set of all histories is H = H * ∪ H ∞ .It will be useful to define payoff as a function Q n : H → R as follows Note that if the game never terminates, both players receive zero payoff.
A strategy for P n is a function σ n : H * → [0, 1]; it corresponds to, for every non-terminal finite history h, the probability that, given that the current history is h, P n will shoot P −n : σ n (h) = Pr("P n shoots P −n ").
A stationary strategy is a σ n depending only on the current state s, hence we simply write σ n (s).Since a stationary strategy σ n depends only on the current state, it is fully determined by the values σ n (s) for s ∈ {00, 01, 10, 11}, i.e., from σ n (00), σ n (01), σ n (10), σ n (11).
But any admissible strategy (i.e., compatible with the game rules) must assign Consequently, a stationary strategy is determined by a single number x n = σ n (11).
A strategy profile is a vector σ = (σ 1 , σ 2 ).We denote the set of all admissible strategies by Σ and the set of all admissible stationary strategies by Σ.
An initial state s(0) and two strategies σ 1 and σ 2 (used, respectively, by P 1 and P 2 ) determine a probability measure on the set of all histories; hence we can define the expected payoffs ∀n ∈ {1, 2} : We have, thus, formulated the terminal payoffs duel as a game.We are interested in the game that starts at s(0) = 11, which we will denote by Γ(p, a, b).We assume that P 1 and P 2 are looking for a Nash equilibrium (NE), i.e., a strategy profile

Stationary Equilibria
As already noted, an admissible stationary strategy σ 1 for P 1 is fully determined by x 1 = σ 1 (11) = Pr(P 1 shoots P 2 ); i.e., σ 1 is determined by a single variable x 1 ∈ [0, 1].Similarly, every admissible stationary strategy σ 2 for P 2 is fully determined by a single variable x 2 ∈ [0, 1].Hence, we will often speak of the strategy x n (rather than σ n ) and the strategy profile (x 1 , x 2 ) (rather than (σ 1 , σ 2 )).When P 1 and P 2 use strategies x 1 and x 2 , the state sequence is a Markov chain; using the previous numbering of states we have the transition probability matrix x 2 ) = (0, 0) then we have the following equation for V 1 (temporarily omitting the x 1 , x 2 arguments for brevity of notation): The equation is obtained as follows: the expected payoff from state 11 is the sum of four terms: 1.
The transition to state 10 gives payoff a 1 and takes place with probability x 1 p 1 (P 1 shot and hit P 2 ) multiplied by (x 2 p 2 + x 2 ) (P 2 either shot and missed or did not shoot);

2.
The transition to state 01 gives payoff −b 1 and takes place with probability x 2 p 2 (P 2 shot and hit P 1 ) multiplied by (x 1 p 1 + x 1 ) (P 1 either shot and missed or did not shoot); 3.
The transition to state 00 gives payoff a 1 − b 1 and takes place with probability x 1 p 1 (P 1 shot and hit P 2 ) multiplied by x 2 p 2 (P 2 shot and hit P 1 ); 4.
The transition to state 11 gives payoff V 1 (it is as if the game starts from the beginning) and takes place with probability (x 1 + x 1 p 1 ) (P 1 either shot and missed or did not shoot) multiplied by (x 2 p 2 + x 2 ) (P 2 either shot and missed or did not shoot).
After some algebra, the V 1 equation is simplified to and has the following solution 3 : .
Proof.Suppose that P 1 and P 2 use the profile (x 1 , x 2 ).To determine whether this is an NE, from P 1 's point of view we have to check whether they have anything to gain by unilaterally deviating to some other strategy σ 1 .A crucial fact is that we only have to check whether P 1 gains by switching to another stationary strategy.This is true because, if P 2 uses the stationary strategy x 2 , then P 1 must solve an Markov Decision Process problem; it is well known that in this case he gains nothing by using non-stationary strategies [33].
Let us first check whether (0, 0) is a Nash equilibrium.If P 1 deviates to another stationary strategy x 1 , we will have Hence, (0, 0) cannot be an NE.Next, take any (x 1 , x 2 ) = (0, 0) and suppose P 1 deviates to y 1 .Then The denominator is positive.The numerator has the sign of x 1 − y 1 .Hence, the sign of is the same as that of x 1 − y 1 and consequently, P 1 never (resp.always) has an incentive to deviate from x 1 to a smaller (resp.greater) y 1 .The same arguments can be applied to P 2 and their strategy x 2 .It follows that the only stationary NE is (x 1 , x 2 ) = (1, 1) and this completes the proof.

Connection to the Iterated Prisoner's Dilemma
Applying Formulas (1) and ( 2) to (x 1 , x 2 ) ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}, we get It can immediately be seen that and if we identify the strategy x n = 0 (never shooting at the opponent) with "cooperation" and the strategy x n = 1 (always shooting at the opponent) with "defection", the above inequalities remind us of the Prisoner's Dilemma (PD).The similarity would be complete if the additional inequalities also held; because in this case we would have which corresponds exactly to the well known sequence of PD inequalities [22]: The first inequality is equivalent to which is always satisfied.The second inequality is < 0, which will be satisfied iff The third inequality is always satisfied.Similarly, the inequalities Combining the above, we get the following "PD-like condition" which is necessary and sufficient to have the following ordering of the payoffs In light of ( 6) and ( 7), we will call the never-shooting strategy x n = 0 (which henceforth will also be denoted by σ C ) the cooperating strategy, and the always-shooting strategy x n = 1 (which henceforth will also be denoted by σ D ) the defecting strategy.The terminology is inspired by the analogy to the PD.Namely, in both the PD and the duel, both players would have a higher payoff if they adhered to σ C , σ C ; but this is not a NE and each player has incentive to switch to σ D .Consequently, rational players will follow the strategy profile σ D , σ D , which, while being an NE, yields lower payoff to both players. 4 As is well known, cooperative NE do exist for the iterated PD, and these involve the use of non-stationary strategies, such as grim-cooperation and Tit-for-Tat (TfT).Hence, in the next section, we will show that there exist corresponding non-stationary cooperative strategies which are NE of Γ(p, a, b).
Before concluding this section, it is worth discussing in what ways our duel game Γ(p, a, b) differs from the IPD.Three obvious differences are: 1.
The IPD is a deterministic game, while Γ(p, a, b) involves randomness; 2.
In the IPD, each player receives a payoff in every turn and the total payoff is the discounted (by a discount factor γ) sum of turn payoffs, while in Γ(p, a, b), payoff is obtained only at the final turn and is undiscounted; 3.
The IPD will last an infinite number of turns, while Γ(p, a, b) may (depending on the p values and the strategy used) terminate in a finite number of turns (in fact, it may be the case that it will terminate in a finite number of terms with probability one).
However, there is an formulation of the IPD in which the payoffs are not discounted but the game may terminate in every turn with a positive probability p = 1 − γ > 0. In this formulation, the IPD is also a random game and will terminate in a finite number of turns with probability one; the total expected payoff of each player equals the discounted payoff of the deterministic IPD version.

Non-Stationary Equilibria
Drawing upon similar results for the IPD, we will now show that the duel has cooperative NE in non-stationary strategies.The first such strategy we introduce is the grim cooperation strategy σ G , which is defined as follows for P n (n ∈ {1, 2}): As long as P −n does not shoot P n , P n never shoots P −n ; if P −n shoots P n at round t, then P n shoots P −n at all rounds t > t.
This strategy was originally used in the analysis of the IPD.
Proof.We have since, if both players adhere to σ G , nobody will ever get killed.Next, let us consider possible P 1 strategies σ 1 deviating from σ G .It is easy to see that it suffices to consider the strategy σ D , because, as soon as P 1 deviates from σ G , P 2 will shoot at P 1 on every turn and hence, P 1 has no incentive to not shoot; furthermore, if P 1 deviates from σ G , they might as well deviate on the first turn.Now, let us compute If P 1 uses σ D at t = 1, then P 2 will also revert to σ D at times t ∈ {2, 3, ...}.Hence, P 1 's expected payoff will be By assumption Hence, (9) holds and P 1 has no incentive to deviate from σ G .By a similar analysis, we can also show that P 2 has no incentive to deviate from σ G .This completes the proof.
Hence, the conditions (8) are stronger than the originally postulated condition (5) for the existence of a "PD-like" ordering in the duel.Now, we will define another non-stationary cooperative strategy, which will turn out to be an NE of the duel.This is the Tit-for-Tat strategy σ T f T , defined for P n (n ∈ {1, 2}) as follows: In the first turn P n does not shoot P −n ; at every other turn P n performs the same action (shooting or not shooting) that P −n performed in the previous round.This strategy was also originally used in the analysis of the iterated PD.
Proof.If both players play the strategy σ T f T , then they never shoot at each other and their payoffs are ∀n ∈ {1, 2} : Now, suppose that P 2 adheres to σ T f T but P 1 deviates.If P 1 gains by deviating from σ T f T at some turn, then they must also gain by shooting at P 2 in the first turn.If they do so, then P 2 shoots at P 1 for all subsequent turns, until P 1 reverts to not firing.Thus, P 1 has two options after their first deviation.

1.
They can continue shooting in all subsequent turns, in which case, so will P 2 ; 2.
They can revert to not shooting, in which case, in the next turn, they are in the same situation as at the start of the game.
Consequently, if P 1 can increase their payoff by deviating, then they can do so, either (a) by shooting in every turn, or (b) by alternating between shooting and not shooting.If we find conditions under which P 1 cannot increase their payoff by either of the above strategies, then, under the same conditions, P 1 cannot increase their payoff by deviating, which implies that σ T f T , σ T f T is an NE.

1.
Consider first the case in which P 1 adopts the strategy σ D of shooting in each turn.
Then we have and, by the same analysis as in the proof of Proposition 2, we know that

2.
Next consider the case in which P 1 alternates between shooting and not shooting.
Then their payoff will be The above equation holds because the expected payoff V S 1 is computed by summing the following possibilities.P 1 will certainly shoot and then: (a) With probability p 1 , P 2 will kill P 2 and hence, receive payoff a 1 ; (b) With probability 1 − p 1 , P 2 will miss (and receive zero payoff) and in the next turn P 2 will shoot and kill P 1 ; this combination has probability (1 − p 1 )p 2 and gives to P 1 payoff −b 1 ; (c) With probability 1 − p 1 , P 1 will miss and in the next turn P 2 will shoot and miss P 1 ; this combination has probability (1 − p 1 )(1 − p 2 ) and returns the game to the original state, in which P 1 receives payoff V S 1 .Simplifying the above equation and solving we obtain For an NE we must have V C 1 − V S 1 > 0 and this will hold when However, from our assumption (10), we have Combining 1 and 2, we see that P 1 has no advantage in deviating from σ T f T ; by a similar analysis, the same holds for P 2 and hence, the proof is completed.
Corollary 1.The duel NE conditions ( 8) and ( 10) are the same.In other words, σ D , σ D is an NE of Γ(a, b, p) iff σ T f T , σ T f T is an NE of Γ(a, b, p).
Let us compare the stationary and non-stationary NE.Initially, we made no assumption regarding the relative size of a n and b n (although we did assume they are both positive).In other words, a n − b n may be positive (P n sets more value in surviving), negative (P n hates their opponent so much that they value killing them more than surviving) or zero.However, even when a n > b n for both n, if the players limit themselves to using stationary strategies, then the only Nash equilibrium consists of both players shooting at each other with probability one (by Proposition 1); for the more desirable outcome of both players surviving to be (another) Nash equilibrium, they must use non-stationary strategies.

Conclusions
We have defined a turn-based duel game with terminal payoffs and shown that it has both stationary and non-stationary Nash equilibria.The non-stationary equilibria that we have established are the grim cooperation and Tit-for-Tat pairs.These are of the same form as the synonymous strategies used in the iterated Prisoner's Dilemma; we were motivated to use these in the duel by the previously explained similarity between the payoff structure of our duel game and that of the IPD.
In addition to their independent interest, the above results have potential application to the truel and nuel problems.As we have pointed out, to the best of our knowledge, the literature on truel and nuel is limited to the study of stationary strategies.In the case of the duel, in addition to stationary NE, we also have non-stationary NE.We reported here two such non-stationary NE ( σ G , σ G and σ T f T , σ T f T ), and it is not hard to construct additional ones, using an approach similar to that used in the study of repeated games [35].We conjecture that, using the methods of the current paper, it is also possible to establish a plethora of non-stationary NE for the general, N-player nuel; we intend to pursue this research direction in the future.
Several variants of the duel can be formulated and are worth exploring.In addition to the variant described in this paper, we have explored a variant in which each player receives some discounted payoff for every turn in which they stay alive.Including those results (and the techniques required for their proof) would increase the size of the current paper inordinately; hence, they will be reported in a separate publication.Further variants to be explored in the future include: sequential play, in which a single player is allowed to shoot in each turn; 2.
random play, in which the player allowed to shoot in each turn is chosen randomly and equi-probably.
In addition, in the future we intend to study the use of non-stationary strategies in truels and nuels.