3.1. Definition of Correlated Equilibrium
The game
G can be extended to include communication from some correlation device, consisting of a profile of signals where the device sends one signal to each player. Following Aumann [
2,
3], let the set of signal profiles be identical to the set of action profiles
A. Prior to the players’ action choices, the device selects a signal profile
, and each player
i receives their signal
but is unaware of signals
received by the other players. Each player’s strategy is now represented by a function
, where
is the action that
i chooses when the device sends them the signal
. The particular strategy, such that i chooses the action that has been signaled to them, is denoted
*, i.e.,
*
. This strategy
* is called the obedient strategy, or the signal-following strategy.
Let
μ denote the correlation device’s probability distribution over
A, and the players have common knowledge of
μ. For example,
is the probability that the device sends player
i a signal to play
p, player
j a signal to play
s, and player
k a signal to play
r. Upon receiving their own signal
, player
i can compute the conditional probabilities
In an evolutionary framework, this view can be adapted so that the players are not relying on Bayesian rationality [
2,
4]. However, I will extend the game
G using the language of classical game theory, in order to articulate the general definition of correlated equilibrium for rock-paper-scissors. For a particular probability distribution
μ, the extended game is denoted Γ
. A correlated equilibrium consists of a probability distribution
μ and signal-following strategies (τ
1*(
), τ
2*(
), τ
3*(
)) where for every player
i:
In other words, a correlated equilibrium involves a probability distribution over signals from the correlation device such that all three players have no expected payoff gains from disobeying the signals they receive. From an evolutionary perspective, a strict correlated equilibrium is of particular interest, where μ is such that the expected payoff from signal-following (the left-hand side of Equation (1)) is strictly greater than the expected payoff from any alternative strategy. It is important to note that the probability distribution μ is not exogenous but is part of the equilibrium. Furthermore, unlike mixed-strategy Nash equilibria, μ is not necessarily a product measure on A.
A strict correlated equilibrium is defined by a set of strategic incentive constraints for the players [
6,
21]. Each incentive constraint for each player involves how the expected payoff from following a particular signal compares to the expected payoff from choosing some particular alternative action that has not been recommended. Each player has three possible actions and therefore has two alternatives to signal-following for each particular signal that the device could send.
Strategic incentive constraints for player
i to follow a signal to play rock:
Under a strict correlated equilibrium, player i’s expected payoff would decline if they were to choose paper when the device had signaled to play rock, which is the meaning of (2a). Each bracketed utility differential in (2a) is the change in player i’s payoff that would result from deviating to paper when the correlation device recommended the strategy profile indicated and players j and k followed their recommendations. Similarly, as indicated by (2b), player i’s expected payoff would decline if they were to choose scissors when the device had signaled to play rock. Bracketed utility terms in (2b) are payoff changes that would result from player i deviating to scissors when it was recommended to play rock, while players j and k obeyed what was recommended.
Strategic incentive constraints for player
i to follow a signal to play paper:
Conditions (2c) and (2d) indicate that player i’s expected payoff would decline if they were to choose, respectively, rock or scissors when signaled to play paper. Each μ identifies how much probability the correlation device assigns to each particular signal profile where paper is recommended to i. Bracketed terms are changes in utility from i disobeying the signal to play paper and instead choosing rock (2c), or instead choosing scissors (2d).
Strategic incentive constraints for player
i to follow a signal to play scissors:
Finally, choosing rock (2e) or paper (2f) when signaled to play scissors would reduce
i’s expected payoff. Any
μ satisfying inequalities (2a)–(2f) results in a strict correlated equilibrium of three-player rock-paper-scissors where players’ action choices are obedient to
μ. This definition of a strict correlated equilibrium follows Aumann’s Proposition 2.3 [
6], p. 6. Due to the symmetry of payoffs in G, the constraints (2a), (2d), and (2e) are identical, as are constraints (2b), (2c), and (2f), so that the system of six constraints could be replaced with just two:
As will be explored further in
Section 3.2, the first of these two constraints indicates that player
i’s payoff would decline if they were to deviate from
any recommendation (regardless of whether itis rock, paper, or scissors in particular) by choosing to play what beats the action that the device recommended. The second of the two constraints above indicates that player
i’s payoff would decline if they were to deviate from any recommendation by choosing to play what is beaten by the action that the device recommended.
The full system of six constraints is useful because it defines strict correlated equilibrium for any three-player version of rock-paper-scissors, including versions different from G with asymmetric payoffs. For instance, unlike G, there could be a three-player RPS game where the solo-winner payoff when rock is the solo winner differs from the solo-winner payoff when paper or scissors is the solo winner. Equations (2a)–(2f) define strict correlated equilibrium both for games with symmetric payoffs such as G, which is the focus of this paper, as well as for games with asymmetric payoffs.
3.2. Conditions for a Strict Correlated Equilibrium
Proposition 1. Let and let
.
If
, then for every
there is a strict correlated equilibrium with
Proof 1. Every strategy specifies for at least one action that could be recommended to i. Let be the action that beats the action that was recommended and let be the action that the recommended action beats. (i) If , then provided that . (ii) If , then provided that . (i) and (ii) (2a)–(2f) hold for all . □
Under a Proposition 1 type of CE, the probability distribution μ only places positive probability on recommendation profiles that would result in two types of outcomes: three-way ties and one-solo-winner/two-tied-last outcomes. The value of q in Proposition 1 is the total amount of probability placed on all three-way ties, with probability on each of the three possible ways that a three-way tie could occur. The value of in Proposition 1 is the total amount of probability placed on all one-solo-winner/two-tied-last outcomes. There are nine different outcomes of this kind, and μ puts probability on each such outcome.
To understand the nature of the Proposition 1 type of equilibrium, two particular disobedient strategies are particularly important. First, upon receiving a recommendation from the correlation device, a player must prefer to follow that signal over always choosing the action which would defeat the action that was recommended. Let be the strategy of “always play the action that beats the action that was recommended,” which consists of . When the recommended action profile is a three-way tie and the other two players, j and k, are following their recommendations, the disobedient strategy achieves a higher expected payoff than *. For instance, if the device recommends , then . If the device recommends that everyone plays rock, then by unilaterally deviating to paper, player i would receive the largest possible payoff.
However, the disobedient strategy results in a worse expected payoff than * when the recommended action profile is a one-solo-winner/two-tied-last outcome. Consider, for example, if the device recommends rock to two of the players and paper to the remaining player. If player i uses the disobedient strategy while j and k use the signal-following strategy, then there is a chance that i was designated to be the big winner (i.e., receives the recommendation to play paper) and by playing the strategy, i instead chooses scissors and ends up with the lowest possible payoff . At the same time, given that a one-solo-winner/two-tied-last outcome was recommended, there is a probability of that player i was designated to end up tied-for-last (i.e., receives a recommendation to play rock in our example), and by playing the strategy, player i instead chooses paper and ends up tied for first. Thus, when a one-solo-winner/two-tied-last outcome is recommended, there is a chance that delivers a worse payoff to i than * and a chance that delivers a better payoff to i than *. In expectation, reduces i’s payoff conditional on a one-solo-winner/two-tied-last recommendation. Considering both of these potential consequences (the certain payoff improvement from given a three-way tie recommendation, as well as the expected payoff reduction from given a one-solo-winner/two-tied-last recommendation), the net effect of playing on i’s expected payoff is negative, provided that .
Secondly, upon receiving a recommendation from the correlation device, a player must prefer to follow that signal over always choosing the action which the recommended action would defeat. Let be the strategy of “always play the action that the recommended action beats,” which consists of . When the recommended action profile is a three-way tie and the other two players are following their recommendations, the disobedient strategy achieves the lowest possible payoff , which is, of course, worse than the expected payoff from *. For instance, if the device recommends , then . If the device recommends that everyone plays rock, then by unilaterally deviating to scissors, player i would suffer the solo-last payoff.
If, instead, the recommended action profile is a one-solo-winner/two-tied-last outcome, then the disobedient strategy results in a better expected payoff than from playing *. Playing would hurt i’s payoff if i had been designated to be the solo winner but would improve i’s payoff if i had been designated to be tied for last. If player i was designated to be the solo winner but followed the strategy , then they would end up with the three-way tie payoff. For instance, if the device recommends rock to players j and k and paper to player i, then the strategy profile results in the outcome , which is worse for i than if they had followed their recommendation and ended up as the solo winner. However, if player i had been designated to be tied for last but followed the strategy , then they would end up with the three-way split payoff. For instance, if the device recommends rock to i and j and recommends paper to k, then the strategy profile results in the outcome , which is better for i than if they had followed their recommendation. In expectation, increases i’s payoff conditional on a one-solo-winner/two-tied-last recommendation. Considering both of these potential consequences (the certain payoff reduction from given a three-way tie recommendation, as well as the expected payoff improvement from given a one-solo-winner/two-tied-last recommendation), the net effect of playing on i’s expected payoff is negative, as long as .
In summary of Proposition 1, the critical requirement of a strict correlated equilibrium is to make the obedient strategy *’s expected payoff higher than both the “play-what-beats-the-recommendation” expected payoff and the “play-what-the-recommendation-beats” expected payoff. The strict CE distribution μ limits how much probability is on three-way ties so that playing the action that beats the action that was recommended (i.e., the strategy ) is worse than the signal-following strategy *. In contrast, the MSNE puts an equal amount of probability on any particular three-way tie as on any particular one-solo-winner outcome, which means that would be just as good as the MSNE strategy. At the same time, the strict CE distribution μ puts more probability on one-solo-winner/two-tied-last outcomes than would occur from the MSNE, but not so much probability that beats *. Playing the action that the recommendation beats is worse than the MSNE strategy, and the strict CE exploits this fact but does not raise the probability of one-solo-winner/two-tied-last outcomes so high that becomes a better strategy than *.
3.3. Numerical Example of Strict Correlated Equilibrium
Figure 2 presents a numerical example of a strict correlated equilibrium for a game with
. The probabilities with which the correlation device recommends particular action profiles are shown in the corresponding cells to the right of the game matrix. Each of the strategic incentive constraints are strictly satisfied:
Strategic incentive constraints for player
i to follow a signal to play rock:
Strategic incentive constraints for player
i to follow a signal to play paper:
Strategic incentive constraints for player
i to follow a signal to play scissors:
The symmetry of payoffs in this numerical example makes (3a), (3d), and (3e) identical, with each of those constraints indicating that results in a lower expected payoff than obeying the signal. Similarly, (3b), (3c), and (3f) are identical, with each of those constraints indicating that is worse than signal-following. Each player’s expected payoff from the CE is 13/30, which exceeds the MSNE payoff of 1/6. The CE achieves a higher expected payoff than the MSNE partly because the CE places more probability on three-way ties, which occur with a probability of 1/5 from the CE and only with a probability of 1/9 from the MSNE. A second contributor to the improvement in the expected payoff is that the CE puts more probability on solo-winner outcomes and zero probability on solo-loser outcomes. Solo-winner outcomes occur with a probability of 4/5 from the CE, and conditional on such an outcome, any given player has a 1/3 chance of being the solo winner with the big payoff of +3¼, and a 2/3 chance of tying for last with the second-worst payoff of −1. This means that, when the correlation device recommends a solo-winner/two-tied-last outcome, each player’s expected payoff is 5/12. Under the MSNE, such outcomes are counter-balanced by equally likely solo-loser outcomes, which have an expected payoff of −5/12, but the CE puts zero probability on solo-loser outcomes. By doing so, the CE achieves higher expected payoffs by giving each player an enhanced chance at the biggest payoff available. Regarding three-way splits, which produce payoffs of ½ for each player, the CE puts zero probability on those outcomes whereas the MSNE puts 2/9 probability on those outcomes. All else equal, this reduces players’ payoffs from the CE compared to the MSNE, but this reduction is overwhelmed by the payoff increases achieved by the CE placing more probability on three-way ties and solo-winner/two-tied-last outcomes.