Competing Conventions with Costly Information Acquisition

: We consider an evolutionary model of social coordination in a 2 × 2 game where two groups of players prefer to coordinate on different actions. Players can pay a cost to learn their opponent’s group: if they pay it, they can condition their actions concerning the groups. We assess the stability of outcomes in the long run using stochastic stability analysis. We ﬁnd that three elements matter for the equilibrium selection: the group size, the strength of preferences, and the information’s cost. If the cost is too high, players never learn the group of their opponents in the long run. If one group is stronger in preferences for its favorite action than the other, or its size is sufﬁciently large compared to the other group, every player plays that group’s favorite action. If both groups are strong enough in preferences, or if none of the groups’ sizes is large enough, players play their favorite actions and miscoordinate in inter-group interactions. Lower levels of the cost favor coordination. Indeed, when the cost is low, in inside-group interactions, players always coordinate on their favorite action, while in inter-group interactions, they coordinate on the favorite action of the group that is stronger in preferences or large enough.


Introduction
Since the seminal contribution of Kandori et al. [1], evolutionary game theorists have used stochastic stability analysis and 2 × 2 coordination games to study the formation of social conventions (Lewis [2] and Bicchieri [3] are classical references on social conventions from philosophy, while for economics, see Schelling [4], Young [5], and Young [6]). Some of these works focus on coordination games such as the battle of sexes: a class that describes situations in which two groups of people prefer to coordinate on different actions. In this framework, the long-run convention may depend on how easily people can learn each other's preferences.
Think about Bob and Andy, who want to hang out together: they can either go to a football match or to the cinema. Both Andy and Bob prefer football, but they do not know what the other prefers. In certain contexts, learning each other's preferences may require too much effort. In these cases, if Bob and Andy know that everybody usually goes to the cinema, they go to the cinema without learning each other's preferences. In other situations, learning each other's preferences may require a small effort (for instance, watching each other's Facebook walls). In this case, Bob and Andy learn that they both prefer football, so they go to a football match together.
In this work, we contribute to the literature on coordination games. We show which conventions become established between two groups of people different in preferences if people can learn each other's preferences by exerting an effort. We do so, formalizing the previous example and studying the evolution of conventions in a dynamic setting. We model the coordination problem as a repeated language game (Neary [7]): we use evolutionary game theory solution concepts and characterize the long-run equilibrium as the stochastically stable state (see Foster and Young [8], Kandori et al. [1] and Young [9]).
We consider a population divided into two groups, which repeatedly play a 2 × 2 coordination game. We assume that one group is larger than the other and that the two The paper is organized as follows: In Section 2, we explain the model's basic features. In Section 3, we determine the results for the complete information case where the cost is 0. In Section 4, we derive the results for the case with incomplete information and costly acquisition. We distinguish between two cases: low cost and high cost. In Section 5, we discuss results, and in Section 6, we conclude. We give all proofs in the Appendix A and we give the intuition during the text.

The Model
We consider N players divided into two groups A and B, N = N A + N B . We assume N A > N B + 1 and N B > 1. Each period, players are randomly matched in pairs to play the 2 × 2 coordination game represented in Tables 1-3. Matching occurs with uniform probability, regardless of the group. Tables 1 and 2 represent inside-group interactions, while Table 3 represents inter-group interactions (group A row player and group B column player). We assume that Π A > π A , and thus, we name a the favorite action of group A. Equally, we assume Π B > π B , and hence, b is the favorite action of group B. We do not assume any particular order between Π B , and Π A . However, without loss of generality, we assume that Π A + π A = Π B + π B . Consider K ∈ {A, B}, and K = K ∈ {A, B}. We say that group K is stronger in preferences for its favorite action than group K if Π K > Π K or equivalently π K < π K .   Table 3. Payoff matrix of inter-group interactions.
Each period, players choose whether to pay a cost to learn their opponent's group or not before choosing between action a and b. If they do not pay it, they do not learn the group of their opponent, and they play one single action valid for both groups. If they pay it, they can condition the action on the two groups. We call information choice the first and coordination choice the second.
Consider player i ∈ K. τ i is the information choice of player i: if τ i = 0, player i does not learn the group of her/his opponent. If τ i = 1, player i pays a cost c and learns the group. We assume that c ≥ 0. x 0i ∈ {a, b} is the coordination choice when τ i = 0. If τ i = 1, x K 1i ∈ {a, b} is the coordination choice when player i meets group K, while x K 1i ∈ {a, b} is the coordination choice when player i meets group K .
A pure strategy of a player consists of her/his information choice, τ i , and of her/his coordination choices conditioned on the information choice, i.e., Each player has sixteen strategies. However, we can safely neglect some strategies because they are both payoff-equivalent (a player earns the same payoff disregarding which strategy s/he chooses) and behaviorally equivalent (a player earns the same payoff independently from which strategy the other players play against her/him). We consider a model of noisy best-response learning in discrete time (see Kandori et al. [1], Young [9]).
Each period t = 0, 1, 2, . . . , independently from previous events, there is a positive probability p ∈ (0, 1) that a player is given the opportunity to revise her/his strategy. When such an event occurs, each player who is given the revision opportunity chooses with positive probability a strategy that maximizes her/his payoff at period t. s i (t) is the strategy played by player i at period t. U i s (s , s −i ) is the payoff of player i that chooses strategy s against the strategy profile s −i played by all the other players except i. Such a payoff depends on the random matching assumption and the payoffs of the underlying 2 × 2 game. At period t + 1, player i chooses If there is more than one strategy that maximizes the payoff, player i assigns the same probability to each of those strategies. The above dynamics delineates a Markov process that is ergodic thanks to the noisy best response property. We group the sixteen strategies into six analogous classes that we call behaviors. We name behavior a (b) as the set of strategies when player i ∈ K chooses τ i = 0, and x 0i = a (b). We name behavior ab as the set of strategies when player i chooses τ i = 1, x K 1i = a, and x K 1i = b, and so on. Z is the set of possible behaviors: Z = (a, b, ab, ba, aa, bb). z i (t) is the behavior played by player i at period t as implied from s i (t). z −i (t) is the behavior profile played by all the other players except i at period t as implied from s −i (t). Note that behaviors catch all the relevant information as defined when players are myopic best repliers. U i z (z , z −i (t)) is the payoff for player i that chooses behavior z against the behavior profile z −i (t). Such a payoff depends on the random matching assumption and the payoffs of the underlying 2 × 2 game. The dynamics of behaviors as implied by strategies coincide with the dynamics of behaviors, assuming that players myopically best reply to a behavior profile. We formalize the result in the following lemma. Lemma 1. Given the dynamics of z i (t + 1) as implied by s i (t + 1), it holds that z i (t Consider a player i ∈ A such that the best thing to do for her/him is to play a with every player s/he meets regardless of the group. In this case, both (0, a, a, b) and (0, a, b, b) maximize her/his payoff. In contrast, (0, b, a, b) does not maximize her/his payoff since in this case, s/he plays b with every player s/he meets. Moreover, the payoff of player i is equal whether Therefore, all the strategies that belong to the same behavior are payoff equivalent and behaviorally equivalent.
A further reduction is possible because aa (bb) is behaviorally equivalent to a (b) for each player. The last observation and the fact that we are interested in the number of players playing a with each group lead us to introduce the following state variable. We denote with n AA (n BB ) the number of players of group A (B) playing action a with group A (B), and n AB (n BA ) the number of players of group A (B) playing action a with group B (A). We define states as vectors of four components: ω = n AA , n AB , n BA , n BB , with Ω being the state space and ω t = n AA t , n AB t , n BA t , n BB t the state at period t. At each t, all the players know all the components of ω t . Consider player i playing behavior z i (t) at period t. U i z i (t) (z , ω t ) is the payoff of i if s/he chooses behavior z at period t + 1 against the state ω t . All that matters for a decision-maker is ω t and z i (t). We formalize the result in the following lemma.

Lemma 2.
Given the dynamics of ω t+1 generated by z i (t + 1), it holds that U i If players are randomly matched, it is as if each player plays against the entire population. Therefore, each player of group K myopically best responds to the current period by looking at how many players of each group play action a with group K. Moreover, a player that is given the revision opportunity subtracts her/himself from the component of ω t where s/he belongs. If i ∈ K is playing behavior a, aa or ab at period t, s/he knows that n KK t − 1 players of group K are playing action a with group K at period t. Define with θ t+1 the set of players that are given the revision opportunity at period t. Given Lemma 2, it holds that ω t+1 depends on ω t and on θ t+1 . That is, we can define a map F(·) such that ω t+1 = F(ω t , θ t+1 ). The set θ t+1 reveals whether the players who are given the revision opportunity are playing a behavior between a, aa, and ab, or a behavior between b, bb, and ba. In the first case we should look at U i a , while in the second at U i b . From now on, we will refer to behaviors and states following the simplifications described above.
We illustrate here the general scheme of our presentation. We divide the analysis into two cases: complete information and incomplete information. For each case, we consider unperturbed dynamics (players choose the best reply behavior with probability 1) and perturbed dynamics (players choose a random behavior with a small probability). First, we help the reader understand how each player evaluates her/his best reply behavior and which states are absorbing. Second, we highlight the general structure of the dynamics with perturbation and then determine the stochastically stable states. In the next section, we analyze the case with complete information, hence, when the cost is zero.

Complete Information with Free Acquisition
In this section, we assume that each player can freely learn the group of her/his opponent when randomly matched with her/him. Without loss of generality, we assume that players always learn the group of their opponent in this case. We refer to this condition as free information acquisition. Each player has four possible behaviors as defined in the previous section. Z = {aa, ab, ba, bb}, with a = aa, and b = bb in this case.
Equations (1)-(4) are the payoffs for a player i ∈ K playing aa or ab at period t.

Unperturbed Dynamics
We begin the analysis for complete information by studying the dynamics of the system when players play their best reply behavior with probability one.
The intuition behind the result is as follows. If players always learn their opponent's group, the inter-group dynamics does not interfere with the inside-group and vice-versa. If player i ∈ K is given the revision opportunity, s/he chooses x K 1i only based on n KK t . Consider a subset of eight states:

Lemma 4.
Under free information acquisition, the states in ω R are the unique absorbing states of the system.
We call (N A , N A , N B , N B ) and (0, 0, 0, 0) Monomorphic States (MS from now on). Specifically, we refer to the first one as MS a and to the second as MS b . We label the remaining six as Polymorphic States (PS from now on). We call (N A , N A , N B , 0) PS a and (N A , 0, 0, 0) PS b . In MS, every player plays the same action with any other player; in PS, at least one group is conditioning the action. In MS a , every player plays aa; in MS b , every player plays bb. In PS a , group A plays aa and group B plays ba. In PS b , group A plays ab while group B plays bb. In both PS a and PS b , all players coordinate on their favorite action with their similar.
In the model of Neary, only three absorbing states were possible: the two MS and a Type Monomorphic State where group A plays aa and group B plays bb. The PS were not present in the previous analysis. We observe these absorbing states in our analysis, thanks to the possibility of conditioning the action on the group.
We can break the absorbing states in ω R into the three dynamics in which we are interested. This simplification helps in understanding why only these states are absorbing. For instance, in inter-group interactions, there are just two possible absorbing states, namely (N A , N B ) and (0, 0). For what concerns inside-group interactions, N A and 0 matter for n AA t , and N B and 0 for n BB t . For each dynamic, the states where every player plays a or where every player plays b with one group are absorbing. In this simplification, we can see the importance of Lemma 3. As a matter of fact, in all the dynamics we are studying, there are just two candidates to be stochastically stable. This result simplifies the stochastic stability analysis.

Perturbed Dynamics
We now introduce perturbations in the model presented in the previous section; that is, players can make mistakes while choosing their behaviors: there is a small probability that a player does not choose her/his best response behavior when s/he is given the revision opportunity. We use tools and concepts developed by Freidlin and Wentzell [10] and refined by Ellison [11].
Given perturbations, ω t+1 depends on ω t , θ t+1 and on which players make a mistake among those who are given the revision opportunity. We define with ψ t+1 the set of players who do not choose their best reply behavior among those who are given the revision opportunity. Formally, ω t+1 = F(ω t , θ t+1 , ψ t+1 ).
We use uniform mistakes: the probability of making a mistake is equal for every player and every state. At each period, if a player is given the revision opportunity, s/he makes a mistake with probability ε. In this section, we assume that players make mistakes only in the coordination choice: assuming c = 0, adding mistakes also in the information choice would not influence the analysis. Note that Lemma 3 is still valid under this specification.
If we consider a sequence of transition matrices {P ε } ε>0 , with associated stationary distributions {µ ε } ε>0 , by continuity, the accumulation point of {µ ε } ε>0 that we call µ , is a stationary distribution of P := lim ε→0 P ε . Mistakes guarantee the ergodicity of the Markov process and the uniqueness of the invariant distribution. We are interested in states which have positive probability in µ .
We define some useful concepts from Ellison [11]. Letω be an absorbing state of the unperturbed process. D(ω) is the basin of attraction ofω: the set of initial states from which the unperturbed Markov process converges toω with probability one. The radius of ω is the number of mistakes needed to leave D(ω) when the system starts inω. Define a path from stateω to state ω as a sequence of distinct states (ω 1 , ω 2 , . . . , ω T ), with ω 1 =ω and ω T = ω . Υ(ω, ω ) is the set of all paths fromω to ω . Define r(ω 1 , ω 2 , . . . , ω T ) as the resistance of the path (ω 1 , ω 2 , . . . , ω T ), namely the number of mistakes that occurs to pass from stateω to state ω . The radius ofω is then r(ω 1 , ω 2 , . . . , ω T ).
We are ready to calculate the stochastically stable states under complete information.
, then PS a is uniquely stochastically stable.
When the cost is null, players can freely learn the group of their opponent. Therefore, in the long run, they succeed in coordinating on their favorite action with those who are similar in preference. Hence, n AA t always converges to N A , and n BB t always converges to 0. This result rules out Monomorphic States and the other four Polymorphic States: only PS a and PS b are left. Which of the two is selected depends on strength in preferences and group size. Two effects determine the results in the long run. Firstly, if π A = π B , PS a is uniquely stochastically stable. The majority prevails in inter-group interactions if the two groups are equally strong in preferences.
Secondly, if π A = π B , there is a trade-off between strength in preferences and group size.
, either group A is stronger in preferences than group B, or group A is sufficiently larger than group B. In both of the two situations, the number of mistakes necessary to leave PS a is bigger than the one to leave PS b : in a sense, more mistakes are needed to make b best reply for A players than to make a best reply for B players. Therefore, every player will play action a in inter-group interactions. Similar reasoning applies if Interestingly, in both cases, only players of one group need to learn their opponent's group: the players from the group that is weaker in preferences or sufficiently smaller than the other.
Unlike in the analysis of Neary, if learning the opponent's group is costless, the Monomorphic States are never stochastically stable. This result is a consequence of the possibility to condition the action on the group. Indeed, if players can freely learn the opponent's group, they will always play their favorite action inside the group.
We provide two numerical examples to explain how the model works in Figures 1 and 2. We represent just n I t , hence, a two-dimensional dynamics. Red states represent the basin of attraction of (0, 0), while green states the one of (N A , N B ). From gray states, there are paths of zero resistance both to (0, 0) and to (N A , N B ). Any path that involves more players playing a within red states has a positive resistance. Every path that involves fewer people playing a within green states has a positive resistance. The radius of (0, 0) is equal to the coradius of (N A , N B ), and it is the minimum resistance path from (0, 0) to gray states. The coradius of (0, 0) is equal to the radius of (N A , N B ), and it is the minimum resistance path from (N A , N B ) to gray states. Firstly, consider the example in Figure 1. N A = 10, N B = 5, π A = 8, Π A = 10, . In this case R(10, 5) = CR(0, 0) = 1, while R(0, 0) = CR(10, 5) = 3. Hence, (0, 0) is the uniquely stochastically stable state. We give here a short intuitive explanation. Starting from (0, 0), the minimum-resistance path to gray states is the one that reaches (0, 3). The minimum resistance path from (10,5) to gray states is the one that reaches (9,5). Hence, fewer mistakes are needed to exit from the green states than to exit from the red states, and PS b = (10, 0, 0, 0) is uniquely stochastically stable.
Secondly, consider the example in Figure 2.

Incomplete Information with Costly Acquisition
In this section, we assume that each player can not freely learn the group of her/his opponent. Each player can buy this information at cost c > 0. We refer to this condition as costly information acquisition. It is trivial to notice that Lemma 3 is not valid anymore. Indeed, since players learn the group of their opponent conditional on paying a cost, not every player pays it, and the dynamics are no longer separable. This time, Z = {a, b, ab, ba, aa, bb}. It is trivial to show that there are four strictly domi- , ∀i ∈ N and ∀ω t ∈ Ω. We define strictly dominant behaviors as Z o = {a, b, ab, ba}, with z o i being a strictly dominant behavior of player i.
Equations (5)- (8) are the payoffs at period t, for a player i ∈ K currently playing a or ab.
Note that if c = 0, then aa = a and bb = b. We begin the analysis with the unperturbed dynamics.

Unperturbed Dynamics
So far, there are no more random elements with respect to Section 3. Therefore, ω t+1 = F(ω t , θ t+1 ). Nine states can be absorbing under this specification.

Lemma 5.
Under costly information acquisition, there are nine possible absorbing states: We summarize all the relevant information in Table 4. The reader can note two differences with respect to Section 3: firstly, some states are absorbing if and only if some conditions hold, and secondly, there is one more possible absorbing state, that is, (N A , N A , 0, 0). Such an absorbing state was also possible in Neary under the same conditions on payoffs and group size.
Where we write "none", we mean that a state is always absorbing for every value of group size, payoffs, and/or the cost. We name (N A , N A , 0, 0) the Type Monomorphic State (TS from now on): each group is playing its favorite action in this state, causing miscoordination in inter-group interactions. In both MS and TS, no player is buying the information, while in PS, at least one group is buying the information.
Monomorphic States are absorbing states for every value of group size, payoffs, and cost. Indeed, when each player is playing one action with any other player, players do not need to learn their opponent's group (the information cost does not matter): they best reply to these states by playing the same action.
Polymorphic States are absorbing if and only if the cost is low enough: if the cost is too high, buying the information is too expensive, and players best reply to Polymorphic States by playing a or b. The Type Monomorphic State is absorbing if group B is either sufficiently close in size to group A or strong enough in preferences for its favorite action and if the cost is high enough. The intuition is the following. On the one hand, if the cost is high and if group B is weak in preferences or small enough, every player of group B best replies to TS by playing a. On the other hand, if the cost is low enough, every player best replies to this state by buying the information and conditioning the action.

Perturbed Dynamics
We now introduce perturbed dynamics. In this case, we assume that players can make two types of mistakes: they can make a mistake in the information choice and in the coordination choice. Choosing the wrong behavior, in this case, can mean both. We say that with probability η, a player who is given the revision opportunity at period t chooses to buy the information when it is not optimal. With probability ε, s/he makes a mistake in the coordination choice. We could have chosen to set only one probability of making a mistake with a different behavior or strategy.
The logic behind our assumption is to capture behaviorally relevant mistakes. We assume a double punishment mechanism for players choosing by mistake the information level and the coordination action. Specifically, our mistake counting is not influenced by our definition of behaviors. We could have made the same assumption starting from the standard definition of strategies assuming that players can make different mistakes in choosing the two actions that constitute the strategy. Our assumption is in line with works such as Jackson and Watts [12] and Bhaskar and Vega-Redondo [13], which assume mistakes in the coordination choice and the link choice.
} is the set of players who make a mistake at period t among those who are given the revision opportunity, ψ ε t+1 is the set of players who make a mistake in the coordination choice, and ψ η t+1 the set of players that make a mistake in the information choice.
Since we assume two types of mistakes, the concept of resistance changes. We then need to consider three types of resistances. We call r ε (ω t , . . . , ω s ) the path from state ω t to state ω s with ε mistakes (players make a mistake in the coordination choice). We call r η (ω t , . . . , ω s ) the path with η mistakes (players make a mistake in the information choice). Finally, we call r εη (ω t , . . . , ω s ) the path with mistakes both in the coordination choice and the information choice. Since we do not make further assumptions on ε and η (probability of making mistakes uniformly distributed), we can assume η ∝ ε.
For example, think about ω t = MS a , and that one player from B is given the revision opportunity at period t. Consider the case where s/he makes a mistake both in the information choice and in the coordination choice. For example, s/he learns the group and s/he plays a with A and b with B. This mistake delineates a path from MS a to the Next, think about ω t = TS: the transition from TS to (N A , N A − 1, 0, 0) happens with one η mistake. One player from A should make a mistake in the information choice and optimally choosing ab. In this case, r η (TS, . . . , (N A , N A − 1, 0, 0)) = 1. With a similar reasoning, r ε (MS a , . . . , (N A − 1, N A − 1, N B , N B )) = 1: a player of group A makes a mistake in the coordination choice and chooses b.
Before providing the results, we explain why using behaviors instead of strategies does not influence the stochastic stability analysis. Let us consider all the sixteen strategies as presented in Section 2, and just one kind of mistake in the choice of the strategy. Let us take two strategies s , s ∈ z and a third strategy s ∈ z . Now consider the statē ω, where s i = s , ∀i ∈ N, and the state ω , where s i = s , ∀i ∈ {0, . . . , N − m − 1} and s j = s , ∀j ∈ {N − m, . . . , N}. Since s and s are both payoff-equivalent and behaviorally equivalent, s and s are the best reply strategies ∀i ∈ N in both statesω and ω . Therefore at each period, every player who is given the revision opportunity in stateω or ω chooses s and s with equal probability. Now let us consider the stateω where s i = s , ∀i ∈ N. When considering the transition betweenω andω , the number of mistakes necessary for this transition is the same whether the path passes through ω or not because the best reply strategy is the same in both ω andω. Therefore, when computing the stochastically stable state, we can neglect s and ω .
We divide this part of the analysis into two cases, the first one where the cost is low and the second one when the cost is high.

Low Cost
This section discusses the case when c is as low as possible but greater than 0.

Corollary 1.
Under costly information acquisition, if 0 < c < 1 N−1 min{π A , π B }, MS and PS are absorbing states, while TS is not an absorbing state.
The proof is straightforward from Table 4. In this case, there are eight candidates to be stochastically stable equilibria.
Theorem 2. Under costly information acquisition, for large enough N, take 0 < c < 1 , then PS a is uniquely stochastically stable.
The conditions are the same as in Theorem 1. When the cost is low enough, whenever a player can buy the information, s/he does it. Consequently, the basins of attraction of both Monomorphic States and Polymorphic States have the dimension they had under free information acquisition. Due to these two effects, the results are the same as under free information acquisition. This result is not surprising per se but serves as a robustness check of the results of Section 3.2.

High Cost
In this part of the analysis, we focus on a case when only MS and TS are absorbing states.
Define the following set of values: The proof is straightforward from Table 4, and therefore, we omit it. We previously gave the intuition behind this result. Let us firstly consider the case in which TS is not an absorbing state, hence, the case when Theorem 3. Under costly information acquisition, for large enough N, If group A is sufficiently large or strong enough in preferences, the minimum number of mistakes to exit from the basin of attraction of MS a is higher than the minimum number of mistakes to exit from the one of MS b . Therefore, MS a is uniquely stochastically stable: every player plays behavior a in the long run. Now we analyze the case when also TS is a strict equilibrium.

Theorem 4. Under costly information acquisition, for large enough N, take
, then TS is uniquely stochastically stable.
Moreover, when all the above conditions simultaneously do not hold: We divide the statement of the theorem into two parts for technical reasons. However, the reader can understand the results from the first three conditions. The first condition expresses a situation where group A is stronger in preferences than group B or group A is sufficiently larger than group B. In this case, there is an asymmetry in the two costs for exiting the two basins of attraction of MS a and MS b . Exit from the first requires more mistakes than exit from the second. Moreover, reaching MS a from TS requires fewer mistakes than reaching MS b from TS. For this reason, R(MS a ) > CR(MS a ) and MS a is uniquely stochastically stable in this case. A similar reasoning applies to the second condition.
The third condition expresses a case where both groups are strong enough in preferences or have sufficiently similar sizes. Many mistakes are required to exit from TS, compared to how many mistakes are required to reach TS from the two MS. Indeed, TS is the state where both groups are playing their favorite action. Since they are both strong in preferences or large enough, in this case, all the players play their favorite action in the long run, but they miscoordinate in inter-group interactions.
The results of Theorems 3 and 4 reach the same conclusions as Neary. However, our analysis allows us to affirm that only with a high cost, the MS or the TS is stochastically stable. This result enriches the previous analysis.
As a further contribution, comparing these results with those in Section 4.2.1, we can give the two conditions for inter-group miscoordination to happen in the long run. First, the cost to pay to learn the opponent's group should be so high that players never learn their opponent's group. Second, both groups should be strong enough in preferences or sufficiently close in size. The following lemma states what happens when the cost takes medium values. When the cost lowers a tiny quantity from the level of Section 4.2.2, TS is not absorbing anymore. Therefore, only PS and MS can be stochastically stable when the cost is in the interval above. However, not all the PS can be stochastically stable, only the two where all the players play their favorite action in inside-group interactions. The intuition of this result is simple: if players condition their action on the groups in the long run, they play their favorite action with those with similar preferences.
We do not study when MS are stochastically stable or when PS are: we leave this question for future analysis. Nevertheless, given the results of Sections 4.2.1 and 4.2.2, we expect that for higher levels of cost, MS is stochastically stable, and for lower levels, PS is stochastically stable.

Discussion
The results of our model involve three fields of the literature. Firstly, we contribute to the literature on social conventions. Secondly, we contribute to the literature on stochastic stability analysis, and lastly, we contribute to the literature on costly information acquisition.
For what concerns social conventions, many works in this field study the existence in the long run of heterogeneous strategy profiles. We started from the original model of Neary [7], which considers players heterogeneous in preferences, but with a smaller strategic set than ours (Heterogeneity has been discussed in previous works such as Smith and Price [14], Friedman [15], Cressman et al. [16], Cressman et al. [17] or Quilter [18]). Neary's model gives conditions for the stochastic stability of a heterogeneous strategy profile that causes miscoordination in inter-group interactions in a random matching case. Neary and Newton [19] expands the previous idea to investigate the role of different classes of graphs on the long-run result. It finds conditions on graphs such that a heterogeneous strategy profile is stochastically stable. It also considers the choice of a social planner that wants to induce heterogeneous or homogeneous behavior in a population.
Carvalho [20] considers a similar model, where players choose their actions from a set of culturally constrained possibilities and the heterogeneous strategy profile is labeled as miscoordination. It finds that cultural constraints drive miscoordination in the long run. Michaeli and Spiro [21] studies a game between players with heterogeneous preferences and who feel pressure from behaving differently. Such a study characterizes the circumstances under which a biased norm can prevail on a non-biased norm. Tanaka et al. [22] studies how local dialects survive in a society with an official language. Naidu et al. [23] studies the evolution of egalitarian and inegalitarian conventions in a framework with asymmetry similar to the language game. Likewise, Belloc and Bowles [24] examines the evolution and the persistence of inferior cultural conventions.
We introduce the assumption that players can condition the action on the group if they pay a cost. This assumption helps to understand the conditions for the stability of the Type Monomorphic State, where players miscoordinate in inter-group interactions. We show that a low cost favors inter-group coordination: incomplete information, high cost, strength in preferences, and group size are key drivers for inter-group miscoordination. Like many works in this literature, we show the importance of strength in preferences and group size in the equilibrium selection.
Concerning network formation literature, Goyal et al. [25] conducts an experiment on the language game, testing whether players segregate or conform to the majority.
van Gerwen and Buskens [26] suggests a variant of the language game similar to our version but in a model with networks to study the influence of partner-specific behavior on coordination. Concerning auctions theory, He [27] studies a framework where each individual of a population divided into two types has to choose between two skills: a "majority" and a "minority" one. It finds that minorities are advantaged in competition contexts rather than in coordination ones. He and Wu [28] tests the role of compromise in the battle of sexes with an experiment.
Like these works, we show that group size and strength in preferences matter for the long-run equilibrium selection. The states where the action preferred by the minority is played in most of the interactions (MS b or PS b ) are stochastically stable provided that the minority is strong enough in preferences or sufficiently large.
A parallel field is the one of bilingual games such as the one proposed by Goyal and Janssen [29] or Galesloot and Goyal [30]: these models consider situations in which players are homogeneous in preferences towards two coordination outcomes, but they can coordinate on a third action at a given cost.
Concerning the technical literature on stochastic stability, we contribute by applying standard stochastic stability techniques to an atypical context, such as costly information acquisition. Specifically, we show that with low cost levels, Polymorphic States where all players in one group condition their action on the group are stochastically stable. Interestingly, only one group of players needs to learn their opponent's group. With high cost levels, Monomorphic States where no player conditions her/his action on the group are stochastically stable. Since the seminal works by Bergin and Lipman [31] and Blume [32], many studies have focused on testing the role of different mistake models in the equilibrium selection. We use uniform mistakes, and introducing different models could be an interesting exercise for future studies.
Other works contribute to the literature on stochastic stability from the theoretical perspective (see Newton [45] for an exhaustive review of the field). Recently, Newton [46] has expanded the domain of behavioral rules regarding the results of stochastic stability. Sawa and Wu [47] shows that with loss aversion individuals, the stochastic stability of Risk-Dominant equilibria is no longer guaranteed. Sawa and Wu [48] introduces reference-dependent preferences and analyzes the stochastic stability of best response dynamics. Staudigl [49] examines stochastic stability in an asymmetric binary choice coordination game.
For what concerns the literature on costly information acquisition, many works interpret the information's cost as costly effort (see the seminal contributions by Simon [50] or Grossman and Stiglitz [51]). Our paper is one of those. Many studies place this framework in a sender-receiver game. This is the case of Dewatripont and Tirole [52], which builds a model of costly communication in a sender-receiver setup.
More recent contributions in this literature are Dewatripont [53], Caillaud and Tirole [54], Tirole [55], and Butler et al. [56]. Bilancini and Boncinelli [57] applies this model to persuasion games with labeling. Both Bilancini and Boncinelli [58] and Bilancini and Boncinelli [59] consider coarse thinker receivers, combining costly information acquisition with the theory of Jehiel [60]. Rational inattention is a recent field where the information cost is endogenous (see Mackowiak et al. [61] for an exhaustive review). We assume that the cost is exogenous and homogeneous for each player.
Güth and Kliemt [62] firstly uses costly information acquisition in evolutionary game theory in a game of trust. It finds conditions such that developing a conscience can be evolu-tionarily stable. More recently, Berger and De Silva [63] uses a similar concept in a deterrence game where agents can buy costly information on past behaviors of their opponents.
Many works use similar concepts of cost in the evolutionary game theory literature on coordination games. For example, Staudigl and Weidenholzer [64] considers a model where players can pay a cost to form links. The main finding is that when agents are constraint in the possible number of interactions, the payoff-dominant convention emerges in the long run.
The work by Bilancini and Boncinelli [65] extends Staudigl and Weidenholzer [64]. The model introduces the fact that interacting with a different group might be costly for a player. It finds that when this cost is low, the Payoff-Dominant strategy is the stochastically stable one. When the cost is high, the two groups in the population coordinate on two different strategies: one on the risk-dominant and the other on the payoff-dominant. Similarly, Bilancini et al. [66] studies the role of cultural intolerance and assortativity in a coordination context. In that model, there is a population divided into two cultural groups, and each group sustains a cost from interacting with the other group. It finds interesting conditions under which cooperation can emerge even with cultural intolerance.

Conclusions
We can summarize our results as follows. When players learn the group of their opponent at a low cost, they always coordinate: they play their favorite action with their similar, while in inter-group interactions, they play the favorite action of the group that is stronger in preferences or with large enough size. If the cost is high, players never learn the group of their opponent. All the players play the same action with every player, or they play their favorite action.
By comparing Sections 4.2.1 and 4.2.2, we can see the impact of varying the cost levels on the long-run results. Surely a low cost favors inter-group coordination. However, a change in the cost level produces two effects that perhaps need further investigation. The first effect concerns the change in the payoff from the interactions between players. The second concerns the change in the purchase of the information.
Consider a starting situation where the cost is low. Players always coordinate on their favorite action in inside-group interactions. If the cost increases, players stop learning their opponent's group (hence, they stop paying the cost), and they begin to play the same action as any other player. If this happens, either Monomorphic States are established in the long run, or the Type Monomorphic State emerges. In the first case, a group of players coordinates on its second best option, even in inside-group interactions. For this group, there could be a certain loss in terms of welfare. In the second case, players miscoordinate in inter-group interactions, and hence, all of them could have a certain loss in welfare.
Nevertheless, when the cost is low, there is a "free-riding" behavior that vanishes if the cost increases. In fact, with low cost levels, only one group needs to pay the cost, and the other never needs to pay it. In one case, players of group A play their favorite action both in inside-group and inter-group interactions; hence, they never need to pay the cost, while group B always needs to afford it. In the other case, the opposite happens. Therefore, when the cost increases, one of the two groups will benefit from not paying for the information anymore. Future studies could address the implications of this trade-off between successful coordination and the possibility of not paying the cost.
We conclude with a short comparison of our result with the one of Neary [7]. Indeed, it is worthwhile to mention a contrast that is a consequence of the possibility of conditioning the action on the group of the player. In the model of Neary, a change in the strength of preferences or the group size of one group does not affect the behavior of the other group. We can find this effect even in our model when the cost is high. For example, when MS a is stochastically stable and group B becomes strong enough in preferences or sufficiently large, the new stochastically stable state becomes TS. Therefore, group A does not change its behavior. However, when the cost is sufficiently low, the change in payoffs or group size of one group influences the other group's behavior in inter-group interactions. For instance, when PS a is stochastically stable, if group B becomes strong enough in preferences or sufficiently large, PS b becomes stochastically stable. Hence, both groups change the way they behave in inter-group interactions.
Nevertheless, we can interpret similarly the passing from MS a to TS and the one from PS a to PS b . In both cases, both groups keep playing their favorite action in insidegroup interactions, and what happens in inter-group interactions depends on strength in preferences and group size. Therefore, in this respect, the behavioral interpretation of our results is similar to Neary's.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Proofs
Proof of Lemma 1. We have to formally show that each strategy inside the same behavior is behaviorally and payoff-equivalent for each player. Consider a player i ∈ K. Define g K i (s −i ) and g K i (s −i ) as the frequencies of successful coordination for i on action a with group K and K given strategy profile s −i .  ((0, a, a, b), Therefore, if (0, a, a, a) is the maximizer, then also (0, a, a, b), (0, a, b, a), and (0, a, b, b) are so. Hence, in this case, i maximizes her/his payoff by choosing behavior a. Moreover, consider s −i = (0, a, a, b) N−1 and s −i = (0, a, a, a) N−1 . In this case . Thanks to symmetry in payoff matrix, the argument stands for all strategies and behaviors. This passage completes the proof.
Proof of Lemma 2. Consider a player i ∈ K currently playing behavior a that is given a revision opportunity at period t. g K i (z −i (t)) is the frequency of successful coordinations of player i on action a with group K at period t, given z −i (t). In this case, U i z (a, z −i (t)) = , ω t ) is the frequency of successful coordinations of player i on action a with group K at period t, given ω t and that player i is currently playing z i (t). Therefore, Therefore, U i a (a, ω t ) = U i aa (a, ω t ) = U i ab (a, ω t ). Equally, U i b (a, ω t ) = U i bb (a, ω t ) = U i ba (a, ω t ). Thanks to symmetry in payoff matrix, the argument stands for all strategies and behaviors.

Appendix A.1. Proofs of Section Complete Information with Free Acquisition
Proof of Lemma 3. Consider a player i ∈ K currently playing aa who is given the revision opportunity at period t. On the one hand, ∀n KK t , U i a (ab, ω t ) = U i a (aa, ω t ). On the other hand, ∀n K K t , U i a (ba, ω t ) = U i a (aa, ω t ). Therefore, player i chooses aa or ab depending on n K K t , and ba or aa depending on n KK t . Moreover, if player i chooses ab instead of aa, n KK t+1 = n KK t , but n K K t+1 < n K K t . If player i chooses ba instead of aa, n KK t+1 < n KK t , but n K K t+1 = n K K t . This passage completes the proof.
With abuse of notation, we call best reply (BR) the action optimally taken by a player in one of the three dynamics. For example, if a player of group A earns the highest payoff by playing a against a player of group B, we say that a is her/his BR. We do this in the context of complete information because of the separability of the dynamics.
Proof of Lemma 4. Thanks to Lemma 3, we can consider the three separated dynamics: n AA t , n BB t , and n I t .

Inside-group interactions.
Firstly, we prove the result for n AA t , and then the argument stands for n BB t thanks to symmetry of payoff matrix. We have to show that all the states in ω R have an absorbing component for n AA t , that is, 0 or N A . When n AA = N A , ∀i ∈ A, a is BR against group A at period t. Hence, F 1 (N A , θ t+1 ) = N A . Symmetrically, if n AA = 0, b is always BR, and so F 1 (0, θ t+1 ) = 0. Therefore, N A and 0 are fixed points for n AA t . We need to show that these states are absorbing, that all the other states are transient, and that there are no cycles. Consider player i ∈ A, who is given the revision opportunity at period t. We definen A as the minimum number of A players such that a is BR, and n A as the maximum number of A players such that b is BR. From Equations (1)-(4), we know . Assume n AA t ≥n A . There is always a positive probability that a player not playing a is given the revision opportunity. Hence, F 1 (n AA t , θ t+1 ) ≥ n AA t . Symmetrically, we can say that if n AA t < n A , F 1 (n AA t , θ t+1 ) ≤ n AA t . We now prove that if n AA t ≤ n A = 0, We prove the first case, and the result stands for the second, thanks to symmetry in payoff matrices. Consider a period s in a state n AA s < n A = 0. For every player, b is BR. Define Pr n AA s+1 = n AA s = p. Such a probability represents the event that only players playing b are given the revision opportunity. Pr n AA s+2 = n AA s = p 2 , Pr n AA s+k = n AA s = p k .
If k → ∞, Pr n AA s+k = n AA s = 0. Therefore, Next, consider n A < n AA 0 <n A . For every i playing a, b is BR, while, for every i playing b, a is BR. There are no absorbing states between these states. If only players playing a are given the revision opportunity, they all choose b, and if enough of them are given the revision opportunity, n AA 1 < n A . The opposite happens if only players playing b are given the revision opportunity.

Inter-group interactions.
We now pass to the analysis of n I t . We define four important values for n AB and n BA : Given these values, we also define two sets of states, Ω b I and Ω a I : Ω a I = n I |n BA ≥ T A and n AB ≥ T B and Ω b I = n I |n BA ≤ D A and n AB ≤ D B . With similar computation as for n AA t , we can say that (0, 0) and (N A , N B ) are two fixed points for n I t . Are they absorbing states? Consider the choice of a player i ∈ A against player j ∈ B and vice-versa. There can be four possible combinations of states: states in which a is BR for every player, states in which b is BR for every player, states, in which ∀i ∈ A, a is the best reply and b is the best reply ∀j ∈ B, and states for which the opposite is true. Let us call the third situation as Ω ab I and the fourth as Ω ba I . Firstly, we prove that Ω a I and Ω b I are the regions where a and b are BR for every player. Secondly, we prove that there is no other absorbing state in Ω a I than (N A , N B ), and no other absorbing state in Ω b I than (0, 0). Assume that player i ∈ A is given a revision opportunity at period t. From Since T A is defined as the minimum value s.t., the latter holds, ∀n BA t ≥ T A , ∀i ∈ A, a is BR against B groups. Now, assume that j ∈ B is given the revision opportunity, a is the BR against group Since T B is defined as the minimum value s.t., this relation is true, ∀n AB t ≥ T B , a is the best reply ∀j ∈ B. Therefore, if n I 0 ∈ Ω a I , n I s ∈ Ω a I , ∀s ≥ 0. Similarly, if n I 0 ∈ Ω b I , n I s ∈ Ω b I , ∀s ≥ 0. Consider being in a generic state (T B + d, In such a state, there is always a probability p that a player not playing a is given the revision opportunity.
Therefore, if n I t ∈ Ω a I \ (N A , N B ), Pr F 2,3 n I t , θ t+1 ≥ n I t > p (meaning that n I t > n I Consequently, We now consider Ω ab I and Ω ba I . Take an n I 0 ∈ Ω ab I : at each period, there is a positive probability that only players of group A are given the revision opportunity, since for them a is the best reply, in the next period, there will be more or equal players in A playing a. Hence, if enough players of A that are currently playing b are given the revision opportunity, n I 1 ∈ Ω a I . By the same reasoning, there is also a positive probability that only players from B are given the revision opportunity; hence, that n I 1 ∈ Ω b I . The same can be said for every state in Ω ba I . Hence, starting from every state in Ω ab I Ω ba I , there is always a positive probability to end up in Ω a I or Ω b I . Lemma A1. Under complete information, Pr lim t→∞ n I t = (N A , N B ) = 1 − Pr lim t→∞ n I t = (0, 0) . Pr lim t→∞ n AA t = N A = 1 − Pr lim t→∞ n AA t = 0 . Pr lim t→∞ n BB t = N B = 1 − Pr lim t→∞ n BB t = 0 .
Proof. We prove the result for n I t , and the argument stands for the two other dynamics thanks to symmetry in the payoff matrix. Firstly, note that whenever the process starts in Ω a I ∪ Ω b I , the lemma is always true thanks to the proof of Lemma 4. We need to show that this is the case, as well as when the process starts inside Ω ab I Ω ba I . We prove the result for Ω ab I using the same logic, and the result stands for Ω ba I for symmetry of payoff matrix. Take n I 0 ∈ Ω ab I , define as p a the probability of extracting m players from A that are currently playing b, and who would change action a if given a revision opportunity. Define as p b the probability of picking m players from B currently choosing a who would change action to b if given a revision opportunity. The probability 1 − p a − p b defines all the other possibilities.
Let us take k steps forward in time: Consider period k + d: Clearly, the probability of being in Ω a I (Ω b I ) is now greater than or equal to (p a ) k ((p b ) k ): we know that once in Ω a I (Ω b I ), the system stays there. The probability of being in Ω ab I Ω ba I consequently, is lower than (1 − p a − p b ) k+d . Taking the limit for d that goes to infinity lim d→∞ Pr n I k+d ∈ Ω ab I Ω ba This means that if we start in a state in Ω ab I there is no way of ending up in Ω ab I Ω ba I in the long run; hence, the system ends up either in Ω a I or in Ω b I , but given this, we know that it ends up either in (0, 0) or in (N A , N B ).
Corollary A1. Under complete information, Pr lim t→∞ n I t = (N A , N B ) = 1 IFF n I 0 ∈ Ω a I . Pr lim t→∞ n I t = (0, 0) = 1 IFF n I 0 ∈ Ω b I . Pr lim t→∞ n AA t = N A = 1 IFF n AA 0 ∈ n A , N A , and Pr lim t→∞ n AA t = 0 = 1 IFF n AA 0 ∈ 0, n A . Pr lim t→∞ n BB t = N B = 1 IFF n BB 0 ∈ n B , N B , and Pr lim t→∞ n BB t = 0 = 1 IFF n BB 0 ∈ 0, n B .
This result is a consequence of the previous lemmas, and therefore, the proof is omitted. Since the only two absorbing states in the dynamics of n I t are (0, 0) and (N A , N B ), they are the only two candidates to be stochastically stable states. From now on we call (0, 0) as I b n and (N A , N B ) as I a n . We define as 0 A the state where all players of group A play b with group A and 0 B the state where all players of group B play b with group B.
Let us call E A and E B the two values for which players in A and in B are indifferent in playing a or b in inter-group interactions.
. From now on, we often use values of N large enough to compare the arguments inside ceiling functions.
Lemma A2. Under free information acquisition, for large enough N, R(I b n ) = CR(I a n ) = E A for all values of payoffs and sizes of groups, while R(I a n ) = CR( Proof. Firstly, we know from Ellison [11] that if there are just two absorbing states, the radius of one is the coradius of the other and vice-versa. Hence, R(I b n ) = CR(I a n ), and R(I a n ) = CR(I b n ). Moreover, from the proof of Lemma 4, we know that D(I a n ) = Ω a I and D(I b n ) = Ω b I . We prove that the minimum resistance path to exit the basin of attraction of I b n is the one that reaches (E B , 0) or (0, E A ), and that the one to exit the basin of attraction of I a n is the one that reaches either (E B , N B ) or (N A , E A ). To prove this statement for I b n , firstly, note that once inside Ω b I , every step that involves a passage to a state with more people playing a requires a mistake. Secondly, note that in a state that is out of Ω b I , at least one of the two groups is indifferent in playing b or a, in other words, in a state where either n AB = E B or n BA = E A or both. Hence, the minimum resistance path to exit from I b n is the one either to (E B , 0) or to (0, E A ). It is straightforward to show that all the other paths have greater resistance than the two above. Since we use uniform mistakes, every mistake counts the same value, and without loss of generality, we can count each of them as 1. Since every resistance counts as 1, then R( Lemma A3. Under free information acquisition, for large enough N, Proof. The proof is straightforward; indeed, the minimum resistance path in terms of mistakes required to reach one absorbing state starting from the other one is the cost of exit from the basin of attraction of the first. As a matter of fact, let us consider R(0 A ); we know from the proof of Lemma 4 that we are out of the basin of attraction of 0 A when we reach the state n A . Hence, The same applies to the other states.
Proof of Theorem 1. We divide the proof for the three dynamics described so far: for what concerns n AA t , N A is uniquely stochastically stable, and for what concerns n BB t , 0 B is uniquely stochastically stable; this proof follows directly from Lemma A3 and therefore is omitted. Let us pass to n I t . We know from Lemma A2 that R(I b n ) = E A and that the value of R(I a n ) depends on payoffs and group size. Let us firstly consider the case when and R(I a n ) = N A − E B . It is sufficient that E A > N A − E B for I b n to be uniquely stochastically stable. Indeed, if this happens, R(I b n ) > CR(I b n ). This is the case IFF To complete the proof, we show that whenever π B π A > N B N A , then I a n is the uniquely stochastically stable state. Firstly, note that π B , R(I a n ) = N A − E B and E A = R(I b n ). However, Equation (A1) is reversed, so, I a n is uniquely stochastically , R(I a n ) = N B − E A and still R(I b n ) = E A . In this case, I a n is the uniquely stochastically stable if This happens for every value of the payoffs (given that Π A > π A ) and of the group size. We conclude that whenever π B π A < N B N A , PS b is uniquely stochastically stable and when , PS a is uniquely stochastically stable.

Appendix A.2. Proofs of Section Incomplete Information with Costly Acquisition
For convenience, we call behavior τ 1 the optimal behavior when a player decides to acquire the information: τ 1 = max(ab, ba, aa, bb).
Proof of Lemma 5. We first show that the nine states are effectively strict equilibria, that there is no other possible equilibrium, and that a state is absorbing if and only if it is a strict equilibrium.

Monomorphic States.
It is easy to show that (N A , N A , N B , N B ) and (0, 0, 0, 0) are two strict equilibria. We take the first case, and the argument stands also for the second, thanks to the symmetry of the payoff matrix. Consider player i ∈ K who is given the revision opportunity at period t: N A , N B , N B ) is a strict equilibrium since π K a > 0 and c > 0. Polymorphic States.
Firstly let us consider the case of (N A , 0, 0, N B ). Since in this case, every player is playing ab, the state is a strict equilibrium IFF max z o i = ab, ∀i ∈ N. If player i ∈ K is given the revision opportunity at period t, Consider the case of (0, N A , N B , 0); since every player is playing ba, it must be that max z o i = ba. i ∈ K faces the following payoffs and therefore ba is the best reply behavior in this case if c < N K −1 N−1 π K b . When the opposite happens, and so These conditions take the form of the ones in Table 4. Consider the remaining four PS; they are characterized by the following fact: BR(n KK ) = BR(n K K ) but BR(n K K ) = BR(n KK ). In this case, it must be that τ i = 0 is optimal for i ∈ K while τ j = 1 is optimal for j ∈ K . Thanks to the symmetry in payoff matrices, we can say that the argument to prove the results for these four states is similar to the one for (N A , 0, 0, N B ) and (0, N A , N B , 0). All the conditions are listed in Table 4.

Type Monomorphic State.
(N A , N A , 0, 0) is a strict equilibrium if a is the BR ∀i ∈ A and b, ∀j ∈ B. Consider a player i ∈ A, who is given the revision opportunity at period t: Given that U i a (a, ω t ) > U i a (b, ω t ), a is the best reply behavior IFF c > N B N−1 π A . Consider player j ∈ B: In this case, is never the best reply and a is the best reply; hence, the state can not be a strict equilibrium.
For what concerns states where not all players of a group are playing the same action with the same group, this is easy to prove. Indeed, by definition, in these states, either not all players are playing their best reply action, or players are indifferent between two or more behaviors. In the first case, the state is not a strict equilibrium by definition; in the second case, there is no strictness of the equilibrium since there is not one best reply, but more behaviors can be best reply simultaneously. Hence, such states can not be strict equilibria. We are left with the seven states where every player of one group is doing the same thing against the same group.  (N A , 0, N B , N B ). It is easy to prove that these states enter in the category of states where not every player is playing her/his best reply. Therefore, they can not be strict equilibria.
Strict equilibria are always absorbing states.
We first prove the sufficient and necessary conditions to be a fixed point, and second that every fixed point is an absorbing state. To prove the sufficient part, we rely on the definition of strict equilibrium. In a strict equilibrium, every player is playing her/his BR, and no one has the incentive to deviate. Whoever is given the revision opportunity does not change her/his behavior. Therefore, F(ω t , θ t+1 ) = ω t . To prove the necessary condition, think about being in a state that is not a strict equilibrium; in this case, by definition, we know that not all the players are playing their BR. Among them, there are states in which there are no indifferent players. In this case, with positive probability, one or more players who are not playing their BR are given the revision opportunity and they change action; therefore, F(ω t , θ t+1 ) = ω t for some realization of θ t+1 . In states where some players are indifferent between two or more behaviors, thanks to the tie rule, there is always a positive probability that the indifferent player changes her/his action since s/he is randomizing her/his choice. Moreover, there is also a positive probability to select a player indifferent between two or more behaviors. In this case, s/he changes the one that is currently playing with a positive probability too. Knowing this, we are sure that no state outside strict equilibria can be a fixed point. In our case, a fixed point is also an absorbing state by definition. Indeed, every fixed point absorbs at least one state: the one where all players except one are playing the same behavior. In this case, if that player is given the revision opportunity, s/he changes for sure her/his behavior into the one played by every player.
Here, we prove the results of the stochastic stability analysis of Section 4.
Proof of Theorem 2. We split the absorbing states into two sets and then apply Theorem 1 by Ellison [11]. Define the following two sets of states: R(M 1 ) is the minimum number of mistakes to escape the basins of attraction of both PS a and PS b . The dimension of these basins of attraction is determined by the value of c. In a state inside D(PS a ), ba is BR for B, and a is BR for A. Similarly, ab is optimal for A inside D(PS b ) and b is optimal for B. The minimum resistance paths that start in PS a , and PS b and exit from their basins of attraction involve ε mistakes.
We calculate the dimension of these basins of attraction for 0 < c < 1 N−1 min{π A , π B }. We start from PS a and the argument stands for the other states in PS for symmetry of payoffs matrix.
Firstly, we consider the minimum number of mistakes that makes a BR for B players.
Consider the choice of a B player inside a category of states where n BB ∈ 0, Referring to Equations (5)- (8), the optimal level of c s.t. 1 is the best reply, and for B players, it is and n AB ∈ N A Π B Π B +π B , N A , 1 is the BR for B. Therefore, a path towards a state where n BB ≥ N B Π B +π B Π B +π B is a transition out of the basin of attraction of PS a . Starting from n BB = 0, the cost of this transition is N B Π B +π B Π B +π B . This cost is determined by ε mistakes, since once in PS a , it is sufficient that a number of B plays by mistake b. Another possible path is to make ba BR for A. The cost of this transition With similar arguments, it is possible to show that the cost of exit from M 1 starting from PS b is the same. For this reason, We can show that the minimum resistance path to exit from the basin of attraction of M 2 reaches either PS a from MS a or PS b from MS b . Therefore, R( . R(M 1 ) > R(M 2 ) for every value of payoffs and group size: the stochastically stable state must be in M 1 .

Analysis with M 1 and M 2
Let us consider the path that goes from M 1 to PS a . Starting in PS b , it is sufficient that players from A play a for a transition from PS b to D(PS a ) to happen.
With a similar argument, it can be shown that R( Proof of Theorem 3. In this case, R(MS a ) = CR(MS b ) and R(MS b ) = CR(MS a ). Therefore, we just need to calculate the two Radius.
Radius of each state.
Let us consider R(MS a ). Since the basin of attraction of MS a is a region where a is the best reply behavior for both groups, many players should make a mistake such that b becomes BR for one of the two groups. For b to be BR for B players, it must be that , it must be that a is still the best reply ∀i ∈ A, and therefore, there is a path of zero resistance to MS a . Nevertheless, once in that state, it can happen that only B players are given the revision opportunity, and that they all choose behavior b. This creates a path of zero resistance to a state (n AA ,n AB , 0, 0). Once in this state, ifn AA < Nπ A −π A Π A +π A , the state is in the basin of attraction of MS b . This happens only if Nπ A −π A Π A +π A + N B = NΠ B −Π B Π B +π B . More generally, considering k ≥ 0, this happens if Nπ A −π A Π A +π A + N B = NΠ B −Π B Π B +π B − k. Fixing payoffs and groups size, k = NΠ B −Π B Π B +π B − Nπ A −π A Π A +π A − N B ; hence, the cost of this path would be With a similar reasoning, We prove that all the other paths with η mistakes are costlier than ones with ε. We know that a is the BR for every state inside the basin of attraction of MS a , nobody in the basin of attraction of MS a optimally buys the information, and every player who once bought the information (by mistake) plays behavior aa. Every path with an η mistake also involves an ε mistake, and hence is double that of the one described above.
Conditions for stochastically stable states.

MS a is stochastically stable IFF
, this is verified when . Therefore, we conclude that MS a is stochastically stable in the above scenario, while if the opposite happens, MS b is stochastically stable.
Proof of Theorem 4. We first calculate radius, coradius, and modified coradius for the three states we are interested in, and then we compare them to draw inference about stochastic stability.
Radius of each state.
The Radius of MS a is the minimum number of mistakes that makes b BR for B players. This number is Nπ B +Π B Π B +π B . The alternative is to make b BR for A: hence, a path to state (0, 0, N B , N B ), and then to (0, 0, 0, 0). The number of ε mistakes for this path is NΠ A +π A Π A +π A . Therefore, R(MS a ) = Nπ B +Π B Π B +π B . With a similar reasoning, we can conclude that Consider TS: the minimum-resistance path to exit from its basin of attraction reaches either MS a or MS b , depending on payoffs. In other words, the minimum number of mistakes to exit from D(TS) is the one that makes either a or b as BR. Consider the path from TS to MS a : in this case, some mistakes are needed to make a BR for B. The state in which a is BR for B depends on payoffs and group size. In a state (N A , N A , k , k ), a is BR for every player in B if (N A + k − 1)π B > (N − N A − k )Π B . This inequality is obtained by declining Equations (5)-(8), comparing B playing a/ab or b/ba. Fixing payoffs, we can calculate the exact value of k that is N B Π B −N A π B +π B Π B +π B ; this would be the cost of the minimum mistake transition from TS to MS a . With a similar argument, the cost of the minimum mistake transition from TS to There are no paths involving η mistakes that are lower than the two proposed above. The intuition is the following. Consider a situation in which m players of A are given the revision opportunity at one period, and they all choose to buy the information. In this case, they all optimally choose behavior ab. This means that at the cost of n, there is a path to a state in which N A − m players are playing b against B, in this state, b is still the BR for group B, while a is still the BR for A. Hence, from that state, there is a path of zero resistance to TS. The same happens when B players choose by mistake to buy the information. Therefore, R(TS) = min N B Π B −N A π B +π B Coradius of each state.
We start from TS: in this case, we have to consider the two minimum-resistance paths to reach it from MS a and MS b . Therefore, Nπ A +Π A Π A +π A and Nπ B +Π B Π B +π B . Firstly, the argument to prove that these two are the minimum resistance paths to reach TS from MS a and MS b are given by the previous part of the proof. Secondly, we have to prove that this path is the maximum among the minimum resistance paths starting from any other state and ending in TS. There are three regions from which we can start and end up in TS: the basin of attraction of MS b , the one of MS a , and all the other states that are not in the basins of attraction of the three states considered. We can say that from this region, there is always a positive probability of ending up in MS a , MS b , or TS. Hence, we can consider as 0 the cost to reach TS from this region. The other two regions are the one considered above, and since we are taking the maximum path to reach TS from any other state, we have to take the sum of this two. Hence, CR(TS) = Nπ A +Π A Π A +π A + Nπ B +Π B Π B +π B . Let us think about MS. Similarly to the two previous proofs, we can focus only on ε paths. Note that in this case, TS is always placed between the two MS. Let us start from MS b : in this case we can consider three different paths starting from any state and arriving to MS b . The first one starts in TS, the second starts in every state outside the basin of attractions of the three absorbing states, and the last starts in MS a . In the second case, there is at least one transition of zero resistance to MS b . Next, assume starting in TS: the minimum number of mistakes to reach MS b from TS is the one that makes b BR for A players. Therefore, N A Π A −N B π A +π A Π A +π A . Now, we need to consider the case of starting in MS a . Firstly, consider the minimum number of mistakes to make b BR for A players. This number is NΠ A +π A Π A +π A . Secondly, consider the minimum number of mistakes to make b BR for B players, and then once TS is reached, the minimum number of mistakes that makes b BR for A players. min r(MS a , MS b ) = min Since the two numbers in the expression are all greater than N A Π A −N B π A +π A Π A +π A , we can say that CR(MS b ) = min r(MS a , MS b ).
Reaching a state where b is BR for group A from TS is for sure less costly than reaching it from MS a , since in TS there are more people playing b. Therefore, The maximum path of minimum resistance from each MS to the other MS passes through TS. Hence, for each MS, we need to subtract from the coradius the cost of passing from TS to the other MS. Let us consider CR * (MS a ); we need to subtract from the coradius the cost of passing from TS to MS b : this follows from the definition of modified coradius. Hence, Similarly,