Cognitive hierarchy theory and two-person games

The outcome of many social and economic interactions, such as stock-market transactions, is strongly determined by the predictions that agents make about the behavior of other individuals. Cognitive Hierarchy Theory provides a framework to model the consequences of forecasting accuracy that has proven to fit data from certain types of game theory experiments, such as Keynesian Beauty Contests and Entry Games. Here, we focus on symmetric two-players-two-actions games and establish an algorithm to find the players' strategies according to the Cognitive Hierarchy Approach. We show that the Snowdrift Game exhibits a pattern of behavior whose complexity grows as the cognitive levels of players increases. In addition to finding the solutions up to the third cognitive level, we demonstrate, in this theoretical frame, two new properties of snowdrift games: i) any snowdrift game can be characterized by only a parameter -- its class, ii) they are anti-symmetric with respect to the diagonal of the pay-off's space. Finally, we propose a model based on an evolutionary dynamics that captures the main features of the Cognitive Hierarchy Theory.

to these two actions as cooperation, when the choice transcends self-interest and concentrates on the welfare of collective, or defection, when it is focused on promoting self-interest. This set of games includes the Stag Hunt (SH) [20], the Snowdrift Game (SG) [18,21], and the Prisoner's Dilemma (PD) [29,30]. SH is a coordination game that describes a conflict between safety and social cooperation, the greatest benefit for both players is obtained when both choose to cooperate, but against a defector the best action is to defect, so that cooperation is the most advantageous and risky choice. SG is an anti-coordination game where the greater individual benefit is obtained by defecting against a cooperator, but players are penalized when both choose to defect, so that it is always more advantageous to choose the opposite action of your opponent. In PD, a player always get the highest individual benefit by defecting, while the greater collective benefit is obtained when both cooperate. For completeness, we also study the harmony game (HG), where the best choice is always to cooperate, regardless of the opponent's behavior; therefore, there are no tensions between individual and collective benefits. An arrangement of these four games has been experimentally studied, finding that players can be classified into four basic personality types: optimistic, pessimistic, trusting and envious, with only a small fraction of undefined subjects [31]. Although some of these four games, particularly SH, have being solved according to the Cognitive Hierarchy approach [10], and the solutions for PD and HG are straightforward, SG presents an intricate pattern of behavior as the cognitive level of the players grows. In this study, we establish an algorithm to solve the SD case: in addition to analytically solving it up to the third cognitive level, we show some symmetries valid for all levels.
We round off this study by exploring the situation in which players can change their guesses about how cognitive levels are distributed in the population. Evolutionary Game Theory is concerned with entire populations of agents that can choose actions according to some strategies in their interactions with other agents [32][33][34]. We propose a model based on an evolutionary dynamics, in which the agents of a population interact among them through the above described games. In this iterated model, the agents do not have any information of the other players, neither regarding their payments nor on their actions, but only onestep memory of their own payment. According to this dynamics, the players make attempts to modify their assumptions about the distribution of the cognitive levels of the other players, allowing them to change their assumptions in case their payment decreases. We numerically solve the model using Monte Carlo simulations, finding patterns of behavior compatible with our theoretical predictions.

Two-person games
Symmetric two-players-two-actions games can be expressed by means of its payoff matrix, where rows represent focal player's actions, columns represent opponent's actions, and the corresponding matrix element is the payoff received by the focal player: Actions C and D are usually referred to as cooperation and defection respectively. Each player chooses one of the two available actions, cooperation or defection. A cooperator receives R when playing with a cooperator, and S when playing with a defector, while a defector earns P when playing with a defector, and T (temptation) against a cooperator. When T > R > P > S, the game is a Prisoner's Dilemma (PD), while if T > R > S > P it is called Snowdrift Game (SG), also Chicken or Hawks and Doves. Otherwise, if S > P and R > T the game is referred to as Harmony Game (HG), while if R > T > P > S it is called Stag Hunt Game (SH).
We consider a well mixed population of N agents. According to the payoff matrix (1), a cooperator will receive a payoff N c R + (N − N c )S, where N c is the number of cooperators, while a defector will receive N c T + (N − 1 − N c )P . A given player will obtain a higher payoff by cooperating than defecting whenever cR + (1 − c)S > cT + (1 − c)P , where c is the fraction of cooperators in the population, excluding himself. That is, there is a threshold S th for the parameter S: above which a player will obtain a higher payoff by cooperating than by defecting. In order to have a two dimensional representation of the parameter space of the four types of game described above, let us fix the values of the payoff parameters P = 1, R = 2. By varying the values of T and S over the ranges T ∈ [1,3] and S ∈ [0, 2], the plane (T, S) can be divided into four quadrants, each one corresponding to a different type of game: HG (T < 2, S > 1), SG (T > 2, S > 1), SH (T < 2, S < 1) and PD (T > 2, S < 1). According to these values, equation (2) becomes: Note that, for fixed T > 2, S th is an increasing function of c, while it is decreasing for T < 2. This observation will be crucial in some of the arguments below in next subsections.
Nash equilibrium

Cognitive hierarchy theory
According to the cognitive hierarchy theory, each agent i (i = 1, 2, . . . , N ) is characterized by her cognitive level l i (l i = 0, 1, 2 . . .) and her assumed distribution of other players' levels. Level-0 players (l i = 0) choose their actions randomly, which means that a level-0 player should cooperate with probability p c = 1/2, regardless of the values of the payoff matrix. A level-1 player (l i = 1) assumes that the other players will act non-strategically (i.e., as level-0 players). In the same way, a level-h player (h > 1) assumes a heterogeneous population consisted of players of lower levels 0, 1, 2, . . . , h − 1. A strategic agent i (l i > 0) assumes that the cognitive levels of her N − 1 opponents are distributed according to a given distribution (Camerer et al. [10] considered this to be Poisson). In particular, a level-h player (h > 1) assumes respective ratios g h (k) of level-k players, Then, each agent chooses the action that would provide a higher payoff if the cognitive levels of the rest of the agents were distributed according to her assumption. The next subsection is devoted to the analysis of the actions taken by the agents in the four types of games under the assumptions of the cognitive hierarchy theory.
B. Analysis

Harmony Game
Provided S > P, R > T , the expected payoff is higher for cooperation regardless other players' actions. In consequence, all strategic players (level higher than zero) will choose cooperation. In the HG cooperation is the only strict best response to itself and to defection.

Prisoner's Dilemma
Given the payoff's ordering T > R > P > S, whatever the value of the cooperation level c is, the expected payoff is higher for defection, and that is what a strategic player i (l i > 0) should choose. In the PD game only the defective action is a strict best response to itself and to cooperation.

Stag Hunt
A player of level 1 assumes a population consisting of N −1 opponents of level-0, that is, she assumes a fraction of cooperators c = 1/2. According to equation (3), a level-1 strategist playing a SH should cooperate if and only if: Now a level-2 player has to consider two situations: (i) For S > T − 1, we have S > S th (T ; 1/2) and level-1 players will cooperate. Thus, the average cooperation c assumed by a level-2 player will be c = g 2 (0)/2 + g 2 (1) = g 2 (0)/2 + g 2 (0) = 1 − g 2 (0)/2. Provided g 2 (0) < g 1 (0) = 1, i.e. level-2 players assume at least one level-1 player, we have c > 1/2, and therefore (using that, for T < 2, S th is a decreasing function of c) S th (T ; c) < S th (T ; 1/2), which implies that a player of level 2 playing a SH will choose to cooperate if S > T − 1.
Consequently, a level-2 player takes the same action as a level-1 player does: to cooperate if and only if S > T − 1. Let us assume that level-k players (k = 1, 2, . . . , h − 1) cooperate if and only if S > T − 1. Then, a level-h player will assume so that she cooperates if and only if S > T − 1, and thus the induction argument allows to conclude that all strategic players cooperate if and only if S > T − 1 in the SH game. Summarizing, the line S = T − 1 divides the quadrant SH into two octants: In the upper octant (S > T − 1), all players of level higher than zero cooperate, while in the lower one (S < T − 1) such players defect. This result is general, for any kind of normalized distributions g l (k) (k = 0, . . . , l − 1; and l ≥ 1) assumed by the agents, and was already pointed out in [10].

Snowdrift Game
A player of level-1 considers that the rest of the players play at random, so that she assumes c = 1/2. In consequence, a level-1 strategist playing a SG should cooperate if and only if: Note that this condition coincides with the cooperation condition (5) for level-1 players playing a SH game. However, things are different for higher level players in the SG, as we now will see. From a technical point of view, the reason is that for the SG, where T > 2, S th is an increasing function of c, reflecting a well-known feature of the Hawk-Dove formulation of the SG, namely that in a population of hawks (doves) it is advantageous to play dove (resp. hawk).
Note that two of the borderlines separating these regions are dependent on the distribution assumed by the level-2 player, i.e. these regions are non-universal.
At this point, one realizes that regarding the action a level-l takes, there may appear more and more regions in the SG quadrant, depending on the specific assumption on the distributions g h (k) (k = 0, . . . , h − 1; and l > h ≥ 1). As an illustrative example, see Appendix A for the possibilities that arise for the actions taken by a level-3 player.
Despite this non-universality and increasing complexity with cognitive levels that characterize the actions taken by players of the SG, we show in the next sub(sub)section two general symmetries that universally hold, under the assumptions of the cognitive hierarchy theory.

Symmetries in the Snowdrift Game
As before, to simplify notation we will assume the values P = 1 and R = 2, though the arguments below remains valid for other values compatible with SG.
Given a particular SG game, corresponding to a pair of values (T, S), with T > 2 and S > 1, we will say that it is a game of class m whenever In other words, m is simply the slope of the straight line connecting the points (T = 2, S = 1) and (T, S).
The first statement that we will prove is the following: S1 Any two SG games of the same class m are equivalent, in the sense that any player takes the same action in both games.
To prove this statement, note that Eq. (3) can be rewritten as so that a rational player playing a game of class m cooperates if m > c/(1 − c), and defects if m < c/(1 − c). Here c is the value of the average cooperation in the population estimated by the rational player under the assumption of a particular distribution of cognitive levels. Now, the value of c that a level-1 player estimates is c = 1/2, irrespective of any consideration, so the action she takes is the same for all games in the same class. Consequently, the estimation of c by a level-2 player is the same in all games of the same class, so that she takes the same action in all of them, and so on for all cognitive levels, which ends the proof of statement S1.
To avoid possible misunderstandings, let us emphasize that the payoffs received by a player in two equivalent games can be very different. The notion of equivalence between games means here equality of the actions taken by an agent in both games, but it doesn't mean equality of payoffs received.
In what follows, a game m is a game of class m. A second symmetry is the following: S2 The action that a player takes in the game m is the opposite to the action she takes in the game m −1 .
Level -1 players satisfy trivially the statement S2, for if m > 1, then m −1 < 1. Now, let us assume that for levels 1, . . . , l − 1 the statement holds. Let us call C l the subset of these levels whose actions in the game m are cooperation, and D l its complementary. Then, level-l players estimate for the game m, while they estimate for the game m −1 , where the last equality follows from the normalization condition on the distribution of cognitive levels.
Consequently, level-l players satisfy statement S2, Thus the statement S2 is proved by the induction argument.

C. Dynamics
In this subsection we introduce a very simple dynamics for the temporal evolution of the distribution that each agent assumes on the cognitive levels of the population, and show results for this dynamics. The assumption is that the only information available to each agent i at a given instant of time t > 1 is her current payoff, Π t i , and her previous payoff, Π t−1 i . Before the presentation of the dynamics, we briefly discuss the types of distribution of cognitive levels considered in the simulations performed.

Distributions of cognitive levels
The first type of distribution that we have considered (below referred to as scenario A) is the "normalized" (truncated) Poisson distribution employed in reference [10], defined as follows. A Poisson distribution is described by a single parameter τ , which is the mean and the variance: A strategic agent i whose cognitive level is l i (> 0) assumes a value of τ = τ i , and that the cognitive levels l j (= 0, . . . , l i − 1) of her opponents are distributed according to where f τ is the Poisson distribution (12), and C i is an appropriate normalization constant, i.e.
Writing equation (13) explicitly one has: A second type of cognitive levels distribution (scenario B) uses, instead of a Poisson distribution, the following exponential law: Now, a strategic agent i whose cognitive level is l i (> 0) assumes that the cognitive levels l j (= 0, . . . , l i − 1) of her opponents are distributed according to that is, explicitly: The third type of distribution (scenario C) that we consider here is just a normalized uniform distribution:

Dynamics algorithm
One must first specify the initial condition (t = 0) for the dynamics. In the simulations that we show below our choice is a population with cognitive levels l i (0 ≤ l i ≤ l max ) distributed according to a truncated Poisson distribution g A lmax,τ (l i ) given by equation (15) where τ = 1.5 and l max = 20. For the cases in which the distribution of cognitive levels assumed by the agents is also "truncated Poisson", the initial rate parameter τ i of an agent i is taken to be τ i = 0 if l i ≤ 1, and τ i = li−1 2 otherwise. Then the agents play simultaneously a one-shot game where the action taken by each strategic agent is the best response for her assumed distribution (either g A li,τi (l j ), or g B li (l j ), or g C li (l j )) for the cognitive levels of her opponents, each one receiving an initial payoff Π i (0).
Thereafter, the dynamics proceeds according to the following rules: at each time step t > 0 Step 1 The agents play simultaneously with the action which is the best response according to their current beliefs (random for level-0 players), each one receiving a payoff Π i (t).
Step 2 Each agent i compares her current and previous payoff. If Π i (t) ≥ Π i (t − 1), the agent i keeps her current belief on the population distribution, while if Π i (t) < Π i (t − 1), the agent makes an attempt to change her belief.
The attempt to change the currently assumed distribution, for the cases in which this is g B li or g C li (say scenarios B or C) consists of two mutually exclusive possible events: • With probability u agent i varies her level l i according to l i (t+1) = l i (t)±1, that is, in an equiprobable way she increases or decreases its level l i at a point.
For the cases in which the agents assume a truncated Poisson distribution, g A li,τi (scenario A), the trial to change current beliefs consists of three mutually exclusive possible events: • With probability u agent i varies her level l i according to l i (t+1) = l i (t)±1, that is, in an equiprobable way she increases or decreases its level l i at a point.
Let us note that the presence of non-strategic (level-0) agents in the initial population is, within this dynamics, a necessary condition for a proper time evolution. A non-strategic agent chooses her action at random, and thus with probability 1/2 her action at t = 1 is different from that at t = 0, then making possible that Π i (1) < Π i (0) for some i. We have performed Monte Carlo simulations of the dynamics defined above for the cognitive hierarchy theory of the SG game, for the three scenarios A, B, and C that correspond, respectively, to the agents' assumption for the cognitive levels distribution given by g A (equation (15)), g B (equation (18)), and g C (equation (19)). In all the cases, the initial conditions were as described in previous subsection, i.e. cognitive levels were initially distributed in the population according to a truncated Poisson distribution with τ = 1.5 and l max = 20. The population size is N = 10 3 , the probability of changing the current cognitive level (provided payoff decreases) is u = 0.45, and, for the scenario A, δ = 1, and v = 0.45.

Scenario
In figure 2 we show, for the three scenarios, the averaged value over one hundred simulations (for each (T, S) point) of the fraction of cooperators and the average cooperative level in the stationary state reached by the dynamics in the whole SG quadrant (T ∈ [2,3], S ∈ [1,2]).
From the inspection of figure 2, a visible result is that the symmetry S1 (equivalence of games in the same class m) is near preserved by the dynamics. The result is indeed remarkable, in the extent that the preservation of this symmetry requires that some specific conditions hold, so that the symmetry conservation is non-generic. This is discussed in detail in Appendix B, where those specific conditions are derived. On the contrary, figure 2 clearly shows the breaking of the symmetry S2 (mirror anti-symmetry respect to the main diagonal of the SG quadrant), in full agreement with the analysis of this symmetry in Appendix B, which shows that no specific conditions are needed for the breaking of this symmetry.
Regarding the cooperation level reached for the different scenarios, the differences are also remarkable. In scenario A near full defection largely dominates below the main diagonal, with a sharp change above it to cooperation values larger than 1/2 which show an overall tendency to increase with the value m of the equivalence class slope, and near full cooperation as m → ∞.   In figure 3 we represent the histograms, for a few selected points of the SG quadrant (T ∈ [2, 3], S ∈ [1, 2]), of the cognitive levels in the stationary state for the scenario A. Each point (T, S) of a lower panel belongs to the equivalence class of the correlative upper panel, to show the near preservation of the symmetry S1. Figures 4 and 5 are as figure 3, but for the scenarios B and C, respectively. The differences between the three figures are merely of a quantitative nature, as indeed they exhibit the same main qualitative features. This observation points out to the conclusion that the qualitative aspects of the distribution of cognitive levels in the stationary state of this dynamics is, to a large extent, rather insensitive respect to the agents' beliefs. This should undoubtably be ascribed to the very scarce information (own current and previous payoff) available to agents in this dynamics. On the other hand, this insensitivity is in contrast with the large differences in the average cooperation reached for the three scenarios, as observed in figure 2. However this is in no way contradictory: Even for an identical distribution of agents' cognitive levels in the population, as far as the agents conform their actions to their beliefs (and not to the real distribution, which they ignore), different scenarios (i.e. different beliefs) produce different cooperation patterns.
The Poisson-like aspect of the histograms in figures 3, 4 and 5 may suggest that, given that our initial condition for the cognitive levels distribution is truncated Poisson, the dynamics preserves the type of initial distribution, with perhaps some shift

III. CONCLUSIONS
We have analyzed here the cognitive hierarchy theory for agents playing two-person two-action games in a well-mixed population. While for the HG, PD and SH games the results are straightforward and universal, i.e. independent of the specific distribution of cognitive levels assumed by the agents, for the SG game the analysis show an increasing complexity with cognitive levels, with results that are non-universal, in the sense that the actions taken by the high cognitive level agents depends on the specificities of the assumed distribution. Despite this non-universality, we find two exact symmetries: For a given assumed distribution of cognitive levels, agents of a fixed cognitive level take the same action (symmetry S1) in all the games (T, S) sharing the value of m = (S − 1)/(T − 2), while they take the opposite action (symmetry S2) in all the games (T , S ) with We introduce a stochastic dynamics where agents can update their current beliefs on the distribution of cognitive levels in the population, with no available information other than their current and previous payoffs. The simulations of the SG game for this dynamics converge to stationary states of the population characterized by an average fraction of cooperators which depends largely on the agents beliefs, but where in contrast, the distribution of cognitive levels reached is rather insensitive to their beliefs.
We provide arguments showing that for synchronous updating, the previous dynamics breaks forcefully the symmetry S2, while the breaking of the symmetry S1 requires some specific conditions, so that though its preservation is non-generic, nonetheless it is not forbidden. Our simulations for different scenarios show the breaking of the symmetry S2, and an apparent conservation of the symmetry S1.      all the other players belong to the same level.
Let us first remind here the results of 2.2.4 concerning the actions of lower level players: 1. Level-1 players cooperate if and only if S > T − 1.
Due to the existence of the symmetry S2 (see above in 2.2.5), we can restrict consideration to regions (a) and (b), that is, the octant 1 < S < T − 1.

Appendix B
The question addressed in this appendix is whether or not the dynamics introduced above preserves the symmetries S1 and S2 of the cognitive hierarchy theory of the SG game.
The preservation of the symmetry S1 requires that the decision of every agent i at any time t of changing her beliefs be the same for all the games in the same class m of equivalence, i.e., that the sign of respectively, the fraction of cooperators at times t and t + 1 for the game (S, m), the values corresponding to the game (S , m −1 ) are c 0 = 1 − c 0 and c 1 = 1 − c 1 .
Under the usual restriction, S < 2, we see that sign ∆(S, m) = −sign ∆(S , m −1 ), so that the updating decisions are opposite, and the symmetry S2 is broken, whenever δ c = c 1 − c 0 = 0. Note that if δ c = 0, both differences are zero and, in both games, the agent doesn't try updating. Let us now consider the case of an agent that cooperates at time t but defects at time t + 1 in game (S, m), so that she defects at t and cooperates at t + 1 in the mirror-symmetric game. The payoff differences are ∆(S, m) = δ c + (S − 1)(c 0 (1 + m −1 ) + m −1 δ c − 1) where we called δ c = c 1 − c 0 , and used equation (23) Thus, if it is the case that −δ c + c 0 (1 + m) − m > 0 the payoff differences have opposite sign for δ c − c 0 (1 + m) + m < ∆(S, m) < 0, while if −δ c +c 0 (1+m)−m < 0 the payoff differences have opposite sign for 0 < ∆(S, m) < δ c −c 0 (1+m)+m.
Summarizing, the symmetry S2 is preserved for δ c = 0, but it is always broken for δ c = 0.